Ув 


EDUCATIONAL AND PSYCHOLOGICAL 


MEASUREMENT 


€ 


| Volume XXII 
1962 


BOX 6907, COLLEGE STATION, DURHAM, N. C. 


EDUCATIONAL and 
PSYCHOLOGICAL 


"ат. 


Editor: G. Frederic Kuder, Duke University 
Assistant Editor: John A. Hornaday, GFeensboro College 
Assistant Editor: Joan F. Hornaday 
Business Manager: Geraldine R. Thomas 


BOARD OF COOPERATING EDITORS 


Louis D. COHEN 
University of Florida 
HAROLD A. EDGERTON 


Performance Research, Incorporated 


Max D. ENGELHART 
Chicago City Junior Colleges 
E. B. GREENE 
Chrysler Corporation 
J. P. GUILFORD 
University of Southern California 
E. F. LINDQUIST 
State University of Iowa 
FnEpERIC M. LORD 
Educational Testing Service 
ARDIE LUBIN 


Walter Reed Army Institute 
of Research 


SAMUEL MESSICK 
Educational Testing Service 
WILLIAM B. MICHAEL 
Ое, of S Southern California 


"ay, Research 


M. W. RICHARDSON 
Richardson, Bellows, Henry and Co. 
JOHN Н. ROHRER 


Georgetown University 
School of Medicine 


P. J. RULON 

Harvard University 
Davrp SEGEL 
' Indiana University 
C..L. SHARTLE 

Ohio State University 
Н. C. TAYLOR 


The W. E. Upjohn Institute for 
Community Research 


THELMA С. THURSTONE 
University of North Carolina 
HERBERT A. Toors 
Ohio State University 
E. G. WILLIAMSON 
University of Minnesota 
Ben D. Моор 


Columbia University 


Donoruy ADKINS Woop 


+0 0011 roe | University of North Carolina 


p: dod 


2 6g...] 


"oca m 


INDEX FOR VOLUME XXII 


Asn, CLIFFORD (wiru Victor B. CLINE AND James M. RicH- 
ARDS, JR.). The Validity of a Battery of Creativity Tests in 
a High SchoglSample ....... a 
AGER, Јон, (wit Eur SALTZ AND MICHAEL REECE). Studies 
of Forced-Choice Methodology: Individual Differences in 
Social Desirability .... eee eene 
ALEXANDER, L. Barton (WITH BENJAMIN KLEINMUNTZ). Com- 
puter Program for the Meehl- e Sh MMPI Profile 
r. TE n O abbas 
ALEXANDER, SHELDON (WITH THEODORE R Ноѕек). The Anz- 
iety Differential: Initial Steps in the Development of a 
Measure of Situational Anxiety .... eee 
ANDERSON, ADOLPH V, (wiru LEONARD V. Gorpon). A Factor 
Analysis of Interests in Certain Skilled Occupations ...... 
ANDERSON, C. C. (WITE В E. Trav). A General Factor of 
Social Desirability in the High School ......... eee 
Ancorr, WruuaM H. Scales With Nonmeaningful Origins and 


| Uis of Measurement ‚уф, Фо, КЕЛЛЕР КРК ЛГ 
| 


AZEN, STANLEY (WITH EDWARD L зану A Fortran Program 

for "Proportion of Variance in Multiple-Regression ........ 
Azuma, Hmosur (WITH Les J. Окохвлсн). Internal-Consist- 
ency Reliability Formulas Applied to Randomly Sampled 
Single-Factor Tests: An Empirical Comparison .......... 

BAKER, FRANK B. A Computer Course for the Behavioral 
BODEN ЖИ бй CR КД EA ДД. 
BAKER, Бовквт L. (wiru Вснлвр E. Sonvrz). A Factor 
Analysis of the Kuder Preference Record—Occupational, 

| Mies vL ТА ВО ROME MERE ЙМ Deme SAI enn 
: .Влкев, ROBERT L. (wiru RICHARD E. Scuvrz). A Comparison 
| of the Factor Structure of the Kuder Occupational, Form 
D for. Males: and Females |... charla ce NA TP 

Bzur, RICHARD Q. Isolation of Elevation and Scatter Compon- 

ents in Personality and Attitude Questionnaires ....... +. 
Bruni, A. W. Forty Variables Phi Coefficient Correlation and 

' Chi-Square Program for the Expanded IBM 650 ........ 
CanistA, Sister M. Mathematical Ability as Related to Rea- 
soning and Use of Symbols ..... eese nnnm 

: Carr, ALBERT (with GILBERT Sax). An Investigation of Re- 
, sponse Sets on Altered Parallel Forms ..............u vs“ 
CATTELL, RAYMOND B. The Basis of Recognition and Inter- 
pretühon-of Factors ........... 5%» жиш л А 
CHRISTENSEN, Р. R. (ттн J. P. GUILFORD, С. 'TAAFFE, AND 

R. C. Wirsox). Ratings Should Be Scrutinized. ....... P 


iti 


781 


365 


193 


325 
473 


617 


iv INDEX 


Curr, Norman. A Note on the Adjustment of Fourfold Tables 
for “Curvilinearity i. d.e о Аи 
CLINE, Victor B. (wir James M. RICHARDS, JR. AND CLIFFORD 
Аве). The Validity of a Battery of Creativity Tests in a 
High School Sample ................ a C 
Comrey, ANDREW L. A Study of Thirty-Five Personality Di- 
E TL EUR MAE E A 
Cox, ANNA (wiru WILLIAM B. MICHAEL, ROBERT A. JONES, 
ARTHUR GERSHON, MARVIN HOOVER, KENNETH Karz, AND 
DENNIS Surrg). High School Record and College Board 
Scores as Predictors of Success in a Liberal Arts Program 
During the Freshman Year Of College CI TTE 2: 
Cox, F. N. A Note on Relationships Between the Yale Anxiety 
and бе Beales e.e REE В 


CRAMER, Етллот M. Fitting the Normal Ogive on the IBM 650 
CRONBACH, LEE J. (уІтн Hosni AZUMA). Internal-Consist- 
ency Reliability Formulas Applied to Randomly Sampled 
Single-Factor Tests: An Empirical Comparison .......... 
Davis, FREDERICK В. (WITH GERALD S. LESSER AND LUCILLE 
Nauemow). Identification of Gifted Elementary School 
Children with Sctentific Talent ioo eodd ved e CORN 
Diers, CAROL J. (WITH ALLEN L. Epwarps). Social Desir- 
ability and the Factorial Interpretation of the MMPI .... 
DiNGMAN, Н. F. (wır В. К. Eyman AND C. E. Мкүннз). 
Comparison of Some Computer Techniques for Factor Ana- 
lytic: Rotation А; ус АМ ЫЛ, 


EDWARDS, ALLEN L. (wITH CAROL J, Dimrs). Social Desir- 
ability and the Factorial Interpretation of the MMPI .... 
Eyan, В. К. (wrru H. F. DINGMAN AND C, Е, Mzvzns). 
Comparison of Some Computer Techniques for Factor 
Analyte Rotation: A Lice ILU d 
FINLEY, CARMEN J. (WITH JACK Tuompson). The Validation 
of an Abbreviated Wechsler Intelligence Scale for Children 
for Use with the Educable Mentally Retarded 


FLEISHMAN, EDWIN A. (WITH DELMER C. Nicxs). What Do 
Physical Fitness Tests Measure? —A Review of Factor 
Analytic Studies . улыл RE 


781 


399 
759 
149 
177 


645 


349 


501 


201 


791 


15 


71 
501 


201 


539 


шш ЕЕ 


INDEX 


2 
FOREHAND, GARLIE A. Relationships Among Response Sets and 
Cognitive Behaviors... Lo 0o oe Cee ee 


LIAM B. Міснлю). An IBM 650 Program for Item Analysis 
of Dichotomized Variables <... UT 


GIBSON, Н. B. Acquiescence and Suggestibility in Children .. 
GORDON, LEONARD V. (wITH Apotex V. ANDERSON). A Factor 
Analysis of Interests in Certain Skilled Occupations ...... 
GREENE, HAROLD E. (уттн MONROE M. Lerxowrrz). Obtain- 
ing Components Essential to a Number of Statistical Anal- 
yses by Use of the IBM Accounting Machine ............ 
GREENE, Harow E, (wire LEOPOLD О. WALDER AND DONNA 
D. Іљғкоуттл), A Method for Deriving “Flexible” Socio- 


matrices from Response Forms Appropriate to Children in 
the Third Grade 


GUILFORD, J. P. (ттн P. R. CHRISTENSEN, С. TAAFFE, AND 
R. C. WILSON). Ratings Should Be Scrutinized .......... 
Haney, Коѕѕюи, (wira WILLIAM B. MICHAEL AND ARTHUR 
GrnsHON). Achievement, Aptitude, and Personality Meas- 
ures as Predictors of Success in Nursing Training ........ 


HANLBY, CHARLES. The "Difficulty" of a Personality Tnuentory 
Item 


о ROSES MENU o 
Hewrrr, Joux Н. (уттн Leon A. ROSENBERG). The MMPI 
as a Screening Device in an. Academic Setting EE 
Hoover, MARVIN (ттн WILLIAM B. MICHAEL, ROBERT A. 


731 


753 


389 


399 
737 


473 


183 


187 
603 
439 


389 
577 
303 
129 


vi INDEX 


ARTHUR GERSHON, MARVIN Hoover, KENNETH Katz, AND 
Dennis SwrrH). High School Record and College Board 
Scores as Predictors of Success in a Liberal Arts Program 


TOO ee ee we tees ә o 


399 


791 
795 


731 


325 
587 
595 
607 
393 


785 


399 


399 


193 
379 


187 


А 


INDEX 


ing Components Essential to a Number of Statistical Anal- 
yses by Use of the IBM Accounting Machine ............ 
LESSER, GERALD S. (WITH FREDERICK B. Davis AND LUCILLE 
NanrMow). Identification of Gifted Elementary School 
Children with Scientific Talent ...... eese eee rene 
LEVONIAN, EDWARD (WITH STANLEY Алем). A Fortran Pro- 
gram for Proportion of Variance in Multiple Regression .. 
LzvoN1AN, EDWARD (WITH RAYMOND GREGORY). A Fortran Pro- 
gram for Multiple Regression Scores .................... 
Lewis, Joun W. Comparing Zero-Order Correlation from 
SCAT Total and Multiple Correlation from SCAT Q and 
V at Southern Illinois University ....................... 
Lewis, Јонм W. Utilizing The Stepwise Multiple Regression 
Procedure in Selecting Predictor Variables by Sex Group 
Lorn, Frepertc M. Estimating Norms by Item-Sampling .... 
LORD, Frepertc M. Test Reliability—A Correction .......... 
Long, MAURICE (wrru James Р. O'CONNOR). Psychotic Symp- 
tom Patterns in a Behavior Inventory .................. 
Lorro, Gary. CORR1 and CORR2—Correlation Routines for 
the TBM. 7070 i. cause bles esa CREE 


Puto fre «athe Cordele nice оа T ШЕШ 


Micwart, WILLIAM В. (WITH Saran GABRIEL SAUNDERS AND L. 
Westy Gappis). An IBM 650 Program for Item Analysis of 
Dichotomized Variables 


171 


389 


399 
765 


viii INDEX 


lytic Studies DIUI CE E 
O'Coxxon, Jams P. (ттн Mavrice Lonn). Psychotic Symp- 


Payne, Davip A. The Concurrent and Predictive Validity of 
an Objective Measure of Academic Self-Concept ......... 

Rawson, Harve E. (wrrH SALOMON Rerria). Controlling the 
Effects of “Clouding Variables” in Multivariate Research 
Desi 


Rerric, SALOMON (wiru Harve E. Rawson). Controlling the 
Effects of “Clouding Variables” in Multivariate Research 


RIMIAND, BERNARD, Personality Test Faking: Expressed Will- 
Ee to Fake as Affected by Anonymity and Instructional 
RUE ca dest vole EI e M Е Oe le D NM a D 
ROSENBERG, LEON А, (wir JOHN Н. Hgwrrr). The MMPI as 
a Screening Device in an Academic Setting ............... 
SALTZ, ELI (wrrg МїснАкт, ВкЕск AND JOEL AGER). Studies of 
Forced-Choice M. ethodology: Individual Differences in So- 


SAUNDERS, Saran GABRIEL (wmn L. WESLEY GADDIS AND WiL- 
LIAM B. Micnazr). An IBM 650 Program for Item Analysis 
of Dichotomized Variables ............................ 

Sax, GILBERT (WITH ALBERT Cann). An Investigation of Re- 
sponse Sets on. Altered Parallel Forms 


LIII D 


565 


449 


INDEX 


t Sax, GrupznT. Theoretically Derived Chance Scores and their 
Normative Equivalents on a Selected Number of Stand- 
ardised Tests „с.е 
Scuurz, Ricuarp E. (wIr ROBERT L. Baker). A Factor Anal- 
ysis of the Kuder Preference Record—Occupational, Form D 
$снотл, RICHARD Е. (WITH ROBERT L. BAKER). A Comparison 
of the Factor Structure of the Kuder Occupational, Form D, 
| for Males and Females ...........- eee eee зз 
Scorr, Mary Носнів (wır KARL C. Garrison). The Rela- 
tionship of Selected Personal Characteristics to the Needs 
of College Students Preparing to Teach ...----+++** eee 
SIEGEL, Laurence (wir REGINALD L, Jones). The Individual 
High School as a Predictor of College Academic Perform- 
ОЛОВ SPEM айа у ез ре say's mie SL КР БИШНЕ 
SMITH, Dennis (with WILLIAM B. MICHAEL, ROBERT A. JONES, 
ANNA Cox, ARTHUR GERSHON, MARVIN HoovER, AND Kex- 
NETH Karz). High School Record and College Board Scores 
as Predictors of Success in a Liberal Arts Program During 
the Freshman Year of College ....... ttn eee F 

Surer, Donat E. (wir James G. Mowry, Ј8.). Social an 

Personal Desirability in the Assessment of Work Values « 
TaarrE, С. (wrrg J. P. Guiron, P. R. CHRISTENSEN, AND 
R. C. Wrsox). Ratings Should Be Scrutinized .--++-+-** 
THOMPSON, Jack (wir Carmen J. FINLEY). The Validation 
of an Abbreviated Wechsler Intelligence Scale for Children 
for Use with the Educable Mentally Retarded .. ее 
Torco, Вомого. An IBM 650 Computer Program for the Eval- 
uation of a Reciprocation Index Utilizing Classroom Data 
Based Upon Unlimited Choices Obtained Under a Single So- 
GIOInnetric OTETI ale e A So eae 
TRAUB, В. E. (ттн C. C. Амренѕох). A General, Factor of 
Social Desirability in the High School ....++++ B 
VERNON, PHILIP E. The Determinants of Reading Compre- 
MTS (Cun citet ce te nM ааста озу ee 
Vick, Mary CATHARINE (WITH Joun A. Hornapay). P redict- 
ing Grade Point Average at a Small Southern College -- - * - 
WALDER, LEOPOLD О. (wiru HAROLD E. Greene, AND DONNA 
D. Lerxowirz). A Method for Deriving “Flexible” Socio- 
matrices from Response Forms Appropriate to Children m 
the Third Стаде cites a 2 Landa eee 
Wesster, HAROLD. Item Sampling as a Sufficient but Unneces- 
sary Requirement for Precise Mental Testing --+::**"*"* 
Wesman, ALEXANDER С. Introduction to the Symposium .. ·· 
“Weryricx, LEONARD. Response Set in a Multiple-C hoice Test 
WILSON, В. С. (ттн J. P. GUILFORD, P. В. CHRISTENSEN, AND 
С. Taarre). Ratings Should Be Scrutinized ...- 


573 
97 


753 


785 


DUCATIONAL and 
SYCHOLOGICAL 


MEASUREMENT 


Editor: G. Frederic Kuder, Duke University 
Associate Editor: John A. Hornaday, Greensboro College 
Business Manager: Geraldine R. Thomas 


BOARD OF COOPERATING EDITORS 


Louis D. COHEN 
_ Duke University 
` HAROLD A. EDGERTON 
Richardson, Bellows, Henry and Co. 
Max D. ENGELHART 
Chicago City Junior Colleges 
-E. B. GREENE 
Chrysler Corporation 
J.P. GUILFORD 
University of Southern California 
Е. F. Lixpquris; 
State University of Iowa 
FnzpERIC M. Lorp 
Educational T'esting Service 
Arpm LUBIN 


Walter Reed Army Institute 
of Research 


SAMUEL Mzssick 
- Educational Testing Service 
WILLIAM B. MICHAEL 
University of Southern California 


M. W. RICHARDSON 
Richardson, Bellows, Henry and Co. 
JOHN Н. RoHRER 


Georgetown University 
School of Medicine 


P. J. тох 
Harvard University 
Davin SgcEL 
Indiana University 
C. L. SHARTLE 
Ohio State-University 
Н. C. TAYLOR « 


The W. E. Upjohn Institute for 
Community Research 


THELMA С. THURSTONE | 
University of North Carolina 
HERBERT A. Toors 
Ohio State University 
E. G. WILLIAMSON 
University of Minnesota 
Ben D. Woop 


Columbia University 


Dororiy ADKINS Woop 
University of North Carolina 


LUME TWENTY-TWO, NUMBER ONE, SPRING, 1962 


ERRATA 


The following corrections should be made in "Analysis of a Doubly 
Nested Design,” by Julian C. Stanley, which appeared in the Winter 
1961 issue, Volume ХХІ, pages 831-837: 


In the third row of Table 2 on page 834 (for p X t), insert cir? a8 
the second of 5 terms in the Average Value of Mean Square. 

Do the same thing for m X t in the seventh row of Table 2. 

In Table 3 on page 835 delete the inequality signs preceding 8.30, 
6.26, and 0.14. 


Note that with these changes every average value of mean square 
in Table 2 contains either the term oirt? or 40x, the former appear” 
ing 9 times and the latter 8. 


EDUCATIONAL, AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


SYMPOSIUM: 


STANDARD SCORES FOR APTITUDE AND 
ACHIEVEMENT TESTS! 


ALEXANDER G. WESMAN, Chairman 
The Psychological Corporation 


ERIC F. GARDNER 
Syracuse University 


ROBERT L. EBEL 


Educational Testing Service 


WILLIAM H. ANGOFF 
Educational Testing Service 


JOHN C. FLANAGAN 
American Institute for Research and University of Pittsburgh 


— 


1 This symposium was held in September, 


т 1960, at the annual convention of 
the American Psychological Association in Ch 


icago. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


INTRODUCTION 


ALEXANDER G. WESMAN 
The Psychological Corporation 


Durine the American Psychological Association convention in 
Chicago (1960), a symposium was offered on Standard Scores for 
Aptitude and Achievement Tests. It was the hope of the participants 
to bring together, for comparison and evaluation, statements of per- 
ceived virtues and limitations inherent in kinds of standard scores. 
The author of each of the following papers does, understandably, 
wish to convert the reader to an acceptance of the author’s con- 
clusions. Equally, each participant in the symposium hoped that 
the placing of arguments side by side would provoke thoughtful 
consideration of the issues. Inevitably, there are many points of 
agreement; happily, there is enough disagreement in perspectives 
and values to arouse the reader’s participation in the process of 
evaluating the proposed ideas, 

The chairman asks the reader, as he asked the panel, to remem- 
ber why standard scores exist. The primary function of a score is to 
summarize information with respect to how an examinee responded 
to a series of tasks (items) placed before him. This summary is 
used as a means of understanding the examinee’s performance on 
the tasks, as a device for communicating information concerning 
the examinee to other people, and as a datum for various kinds of 
statistical manipulations. A discussion of the advantages or dis- 
advantages of one or another kind of score must take into account 
these several uses. 

If one kind of standard score facilitates interpretation, or pro- 
vides more direct understanding of the examinee’s performance 
than does some other, the former has a real advantage. If one kind 


5 


6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of score is especially effective for communicating the meaning of a 
test score to those who have a proper interest in that performance 
—the examinee, his teacher or supervisor, his family or guidance 
counselor—that characteristic serves to make it attractive. Simi- 
larly, the ease with which the score can be manipulated statistically, 
or its pertinence with respect to special applications (such as the 
measurement of growth in ability), may be a good reason for pre- 
ferring that score to one which does not readily lend itself to such 
manipulation or application. 

It is unlikely that any one type of score will be found superior 
in all desirable characteristics. Accordingly, the proponent of a 
partieular variety of standard score is, implicitly at least, expressing 
a value judgment concerning the relative importance of the several 
uses of scores as well as a judgment concerning the device to be 
employed to make those uses possible. 

It is fitting, before our speakers discuss their proposed kinds of 
standard scores, to re-state for their audience a definition of 
“standard score" so that we may have a common frame of discourse. 
In English and English, A Comprehensive Dictionary of Psychologt-. 
cal and Psychoanalytical Terms, we find “standard score: 1. any 
derived score using as its unit the standard deviation (or some 
fraction thereof) of the population that is regarded as the eriterion 
group. 2. 2 score or 2, the difference between the obtained score and 
the mean, divided by the standard deviation. . . . Standard scores, 
when based on normal distributions, are for most purposes com- 
parable even though the raw scores are incomparable. Thus, by 
standard scores it is possible to show that a person can jump better 
than he can run (even though feet jumped is incommensurable with 
yards per second) or that he is more intelligent than he is emo- 
tionally stable.” 


oran AND PSYCHOLOGICAL MEASUREMENT 
I, No. 1, 1962 


NORMATIVE STANDARD SCORES 


ERIC F. GARDNER 
Syracuse University 


A single isolated test score is of little or no value. For a score to 
have meaning and be of social or scientific utility, some sort of 
frame of reference is needed. A number of different frames of refer- 
ence have been proposed and have been found to have value. 

One possible frame of reference is the content of the test itself. 
Among derived scores one of the earliest was the per cent of a 
defined sample of tasks which an individual has completed satis- 
factorily. The deficiencies inherent in these kinds of scores have 
been discussed so many times in the literature no attempt will be 
made here to go into detail again. A few of the issues are the lack 
of comparability of per cent scores on the same test for different 
people, the lack of comparability from test to test, and the lack of 
algebraic utility. The following comments illustrate these points. 
John and Jane might each have scores of 60 per cent on the same 
test but have answered correctly very different items. A score of 
80 per cent on a hard test is obviously not comparable to a score 
of 80 per cent on an easy test. For algebraic utility, equal units 
throughout the scale are desirable. It is not reasonable to assume 
that the difference between scores of 60 per cent and 70 per cent 
represent the same difference in ability as that between 90 per cent 
and 100 per cent. Such scores ignore differences between items of 
the test in representativeness, difficulty and importance. Also it is 
obvious that the meaning of such scores is entirely dependent upon 
the particular sample of items included. =. 

The content provided by the items of a test yields scores which 
may be directly related to standards set by the examiner who pre- 


zi 


8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


pared the test. He may regard a score as good or poor on the basis 
of his judgments of the difficulties of the items and the expected 
performance of those taking the examination. Such judgments are 
difficult to make and frequently are not related to the realities of 
the situation. For example, when teachers or examining boards dis- 
cover that very large proportions of those examined have fallen 
below the standard they originally built into the examination, they 
generally revise their judgments about the test and re-evaluate the 
test results. Thus the content or “absolute” frame of reference is 
supplemented by a relative frame of reference based upon knowl- 
edge of the performance of the group of examinees. 

The inadequacy of the content frame of reference led to a con- 
sideration of additional approaches. One of the most commonly 
used frames of reference is the performance of some well-defined 
group of examinees. The College Board score scale, with a mean of 
500 and a standard deviation of 100 for a group of examinees on 
which it was established some years ago, is one type of normative 
standard score. The I.Q. and grade scores are different types of 
basically normative standard scores. 

This type of score provides a meaningful report of the examinee's 
performance in relation to those of members of a defined reference 
group. For example, it may be more useful to know how an ex- 
aminee's performance on a partieular test compares with those of 
his peers, than to know how it compares with the standards of the 
examiners, For many purposes, such as selection, placement and 
prediction, it is useful to know the location of a given score with 
respect to a particular frequency distribution of scores, For example, а 
grade score of 6.5 incorporates in it the information that the subject 
has obtained a score that is the same as the average of the norma- 
tive group of sixth-graders who have been half a year in school. 
An IQ. of 100 indicates that on the particular intelligence test the 
subject performed at the same level as the average of the norma- 
tive group. A standard score of 600 indicates performance which 
is one standard deviation above the mean of the normative group. 

In most cases the test user is concerned about frames of reference 
based on both content and group performance. He is interested in 
having knowledge about the specific responses of the individual to 
the items of the test and also knowledge about the performance of 
the individual relative to that of other individuals. 


ERIC F. GARDNER 9 
Some Desirable Properties of Items 


If we ignore praetieal considerations and concern ourselves with 
characteristics of items that would aid in scaling and test interpre- 
tation, there are a number of desirable properties that can be men- 
tioned. Some of these are difficult or impossible to obtain; while 
others, if obtained, would almost certainly prevent our achieving 
more important characteristies. Considering each specific issue in 
isolation and simultaneously assuming that all other necessary re- 
quirements for a good test are met, we could argue the following 
properties would be desirable: 


1. The test consists of items which constitute a representative 
sample of the domain tested. It should be a sample of behaviors 
that represent the objectives which have previously been defined. 
2. The items in the test form a Guttman Scale. This property 
implies that the items selected ean be ranked in the same order 
of difficulty for each individual. Once the items have been so 
ranked, any examinee will answer correctly all those items of 
less difficulty and incorrectly all those of greater difficulty. Thus 
a score of 17 means that the person answered correctly the first 
seventeen items and incorrectly all others. Such an arrangement, 
of items permits an unambiguous interpretation of the score 17 
in that all people who score 17 have answered correctly the same 
items. 

3. The items in the test can be arranged along a continuum. of 
the variable under consideration in such a way that the raw 
scores constitute an interval scale. The items included in such а 
test would have the property of representing equal differences in 
ability between adjacent items. For example, the difference in 
ability represented by scores of 53 and 54 would be the same as 
that represented by scores of 85 and 86. 

4. The items are of such nature that a zero score on the test 
represents zero amount of the ability being tested. If the condi- 
tion specified in property 3 is now added, the scale becomes а 
ratio scale which is amenable to all four arithmetic operations. 
5. The items provide a scale unit which is meaningful. There are 
advantages in having the size of unit related to the standard 
error of measurement in such a way that a user has some idea 
as to the likelihood of a difference being entirely due to error. 


10 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


A test possessing the properties just enumerated (that is, con- 
sisting of items which (1) adequately represent the domain to be 
tested and (2) can be ranked in order of difficulty, and starting with 
an absolute zero will provide successively equal increments of knowl- 
edge) provides a raw score scale with very desirable characteristics. 
Unfortunately these properties, although desirable, are difficult to 
achieve and in many practical situations the achievement of one 
results in less success in achieving another. For example, I would. 
argue that property 1 is paramount for any achievement test. That 
is—a good achievement test should itself define the objectives 
measured. These objectives are set up by those agents of society 
who are responsible for decisions concerning edueational objectives, 
and the test constructor must attempt to incorporate that definition 
in the building of the examination. This point of view implies that 
the method of scaling an educational achievement test should not. 
be permitted to determine the content of the test or to alter the 
definition of objectives implied in the test. It is most probable that. 
an attempt to select items so that the raw score scale produced has 
properties 2 and 3 (an interval Guttman Seale) would eliminate: 
from the test sample important concepts and skills, 


Sampling from Populations of Items and Examinees 


This discussion so far has suggested that the interpretation of 
achievement test scores requires one to consider two very different 


types of frames of reference, each associated with a particular 
sampling problem. 


lation. A difficulty index for a reading item obtained from a typical 
fourth grade obviously does not have the same meaning as one ob- 


ERIC F. GARDNER 11 


tained from a typical sixth grade. A person scoring at the eighty- 
fourth percentile, or obtaining a T-score of 60 in an arithmetic test 
where the score is calculated from a typical seventh grade sample, is 
not performing at the same level as one whose standing at the 
eighty-fourth percentile on the same test is calculated from a be- 
low-average seventh grade. Likewise a pupil with a vocabulary 
grade score of 6.2 obtained from a representative sample of fifth 
graders, in say, Mississippi, is certainly not comparable to a pupil 
making a score of 6.2 based on a national representative sample. By 
the same token, one would hardly expect a set of decoys for an 
arithmetic multiple-choice item to function in the same fashion in 
both a fifth and ninth grade. The importance of the particular 
reference population which is used cannot be overemphasized. 


Current Practice 


In the construction of an achievement test, the issue of the 
sampling of the items is considered under the concept of validity— 
usually content validity. Appropriate objectives are defined, tables 
of specifications are established, and trial items are constructed to 
sample the variable described, 

Data are then obtained to give information about the statistical 
characteristics of the items. In the light of this additional informa- 
tion, the test is assembled in such a way that the items will sample 
both content defined by the objectives and the ability of the ex- 
aminees for which it is designed. 

Attempts are then made by scaling procedures to approximate 
some of the other desirable properties which the test does not acquire 
solely through the relationship of the items to each other. Current 
methods of scaling educational achievement tests are based upon the 
statistical properties of the test, or of the individual items consti- 
tuting the test with reference to a particular population of exami- 
nees. That is, such scales are derived from normative data. 

Raw scores on some educational achievement tests are meaningful 
in themselves in terms of content of the test. For example, a score of 
30 on a test built of 50 basic addition combinations gives some in- 
formation about the particular student without regard for the per- 
formance of any other person. However, groups of such items ar- 
ranged with reference to such a meaning do not constitute scales. 
You cannot compare 30 out of these 50 basic addition facts with 


12  EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


30 out of a different set, or six out of 15 rules of grammar, or with a 
possible number of vocabulary items in Russian. Some frame of ref- 
erence is needed so that performance from person to person and 
group to group can be compared. The scaling job still remains to be 
done. 

Any added meanings of scaled scores is due entirely to the con- 
tribution of the normative data, and that meaning applies, strictly 
speaking, only to the particular reference population involved in 
the scaling process, This statement holds whether the scale is based 
solely upon item statistics or upon some operations on the total 
score. Normative standard scores are dependent upon the sample of 
Subjects selected (Gardner, 1953, pp. 13-21). 

Let us consider the role of the population in several common scal- 
ing procedures, A familiar frame of reference is provided by the 
performance of individuals in a single well-defined group on a par- 
ticular test at a particular time. Two commonly used scales have 
been derived within such a frame of reference. The simplest are 
ordinal scales, such ag percentile scores, in which the scale number 
describes relative position in the group. The second type are in- 
terval scales where an effort has been made to obtain algebraic 
utility by definition. The T-scores of McCall represent an interval 
seale where equal units have been defined as equal distances along 
the abscissa of a postulated normal population frequency distribu- 
tion. A variation is the College Entrance Examination Board scores 
with a mean of 500 and standard deviation of 100 for the parent 
normally distributed population. 

A second type of frame of reference is provided by the test per- 
formance of individuals belonging to well-defined subgroups where 


the subgroups have a specific relationship to each other within the 
composite group, 


provide ordinal scales which have had wide utility in the elementary 
grades. Attempts have been made to obtain the merits of an alge- 
braically manipulatable scale by utilizing ordinal relationship of 
subgroups but introducing restrictions in terms of the shape of fre- 
quency distributions, Efforts to obtain interval scales within such 


ERIC F. GARDNER 13 


frames of reference have been made by Flanagan (1939) in the de- 
velopment of Scaled Scores of the Cooperative Tests and by Gardner 
(1950) in the development of K-Scores. 

Test scores are used by administrators, teachers and research 
workers to make comparisons in terms of rank, level of development, 
growth and trait differences among both individuals and groups. 
Hence many types of scales and norms have been developed de- 
pending upon the intended use. Each is consistent within itself but 
the properties of the scales are not completely consistent from one 
type of scale to another. For example, a grade scale is not appropri- 
ate for measuring growth in a function unless one is willing to accept 
the assumption that growth is linearly related to grade. The scaling 
of the Binet items involves the assumption of a linear relationship 
between Mental Age and Chronological Age. As valuable and useful 
as the Binet Scale has been for the purpose for which it was de- 
signed, it has obvious limitations when we try to infer the “true” 
nature of intellectual growth. 

It should be emphasized that the adoption of any one of the scales 
available does not exclude the use of any of the others. In fact, most 
situations require the test user to utilize more than one type of scale 
or norm for an adequate interpretation of test results. 


Conclusion 


Normative standard scores are measures obtained from scales 
having certain specific properties, and they incorporate in the 
numerical values certain information about the normative group 
used. They are obtained by statistically manipulating the raw score 
responses of a defined group of people on a defined sample of content. 
It is desirable to facilitate the interpretation of test scores by giving 
them as much direct meaning as possible. As Flanagan (1950) has 
said “. . . if much information is built into the score itself, continual 
use makes its interpretation more and more direct and immediate. 
It is also of great assistance if such fundamental built-in meanings 
can be as constant from one test to the next as possible.” However, 
the amount of meaning that can be built into any single reference 
scale will constitute only a very small part of the total amount of 
meaning to be desired by all of the test users from those results, It is 
almost always necessary to supplement the knowledge inherent in 
the scores with other normative data. Norms based on a variety of 


14  EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


different groups have considerable merit. Different types of norms 
such as grade scores, percentile scores and various types of standard 
Scores all have their place. The case for all normative standard 
scores stands or falls on their ability to provide additional and more 
useful information than ean be obtained from the raw scores from 
which they were derived. 


REFERENCES 


Flanagan, John C. “Scaled Scores." The Cooperative Test Service 
of the American Council on Education, 1939. 

Flanagan, John C. "Units, Scores and Norms.” In Educational 
Measurement (E. F, Lindquist, Editor). Washington, D. C.: 
American Council on Edueation, 1950. 

Gardner, Eric F. “Comments on Selected Scaling Techniques with a 
Description of a New Type of Scale.” Journal of Clinical Psy- 
chology, VI (1950), 38-42. 

Gardner, Eric F. “The Importance of Reference Groups in Scaling 
Procedure.” roceedings of the 1952 Invitational Conference on 
TOM Problems. Princeton, N. J.: Educational Testing Service, 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


CONTENT STANDARD TEST SCORES 


ROBERT L. EBEL 
Educational Testing Service 


Bv the term content standard test score in this discussion we will 
mean a number that indicates the per cent of a systematic sample 
Írom a defined domain of tasks which an individual has performed 
successfully. For example, if a school child is asked to add each of 
the 100 possible pairs of single digit numbers, the number of sums 
he gives correctly, which in this special case is numerically equal to 
the per cent correct, is his content standard score. 

If a college student is asked to match the appropriate definition 
with a sample of words selected systematically from a specified 
dictionary, the per cent of correct matchings he achieves is his 
content standard score on this test. 

The word content in the term we are defining signifies that the 
score is based directly on the tasks which make up or provide the 
content of the test. This is in contrast to normative standard scores, 
which are based on the relative performances of those who have 
taken the test. 

The word standard in this term signifies two rather different 
things. The first, and less crucial, is that the scores are reported on 
а common scale in which each performance is scored as a per cent 
of the maximum possible performance. In this respect content 
standard scores are conceptually identical with the familiar but now 
generally discredited per cent scores applied to subjectively evalu- 
ated essay tests, and to the once popular but equally questionable 
per cent marks on course achievement. Normative standard scores 
are also standard in the sense that they are based on a common 
scale, although of course the common scale-isafot.a per cent scale but 


15 


16 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


one in which the mean, standard deviation, and sometimes the shape 
of the distribution, are specified. 

The second and more crucial significance of standard in content 
test score is that the processes by which the scores are obtained— 
the test construction, administration and scoring—are explicit and 
objective enough so that independent investigators would obtain 
substantially the same scores for the same persons. Normative 
standard scores are seldom “standard” in this second sense. It is also 
important to note that a per cent score on a subjectively chosen 
collection of tasks would not be a content standard test score, as we 
use the term. Unless the score is based on a systematic sample from 
a defined domain of tasks, it cannot provide a very sound basis for 
inferences as to the examinees’ performance on similar collections of 
tasks. 

Content standard test scores are obviously related closely to raw 
scores, which many test specialists appear to hold in low esteem. 
Referring to Johnny’s score of 15 on his spelling test, Thorndike and 
Hagen (1955) say, “Actually as it stands it has no meaning at all 
and is completely uninterpretable.” Noll (1957) says much the same 
thing, “... A single score on a given test is merely a number and has 
no meaning in and of itself.” Stanley and Ross (1954) also dismiss 
Taw scores with only a little clemency, “But a raw or point score by 
itself means very little.” 

Statements like these are commonly used to introduce, and to 
justify, discussions of derived scores and standard scores in many 
measurement textbooks. In a strict literal sense such statements are 
very nearly true. But in a practical sense they may be misleading. 
Raw scores are seldom obtained or reported in a contextual void. It 
is true, as Travers (1955) says, that “A raw score suffers from the 
defect of being uninterpretable unless additional data are provided.” 
But a standard score suffers from exactly the same defect. No test 
Score, raw or standard, has much meaning as an abstract number. 
Additional data for interpretation must always be provided, either 
by the test producer or by the test user from his own knowledge and 


experience, The numbers which report standard scores are no more 
intrinsically meaningful, and no more self- 


interpreting, than raw 
scores. 


_ To know that a student answered correctly 72 per cent of the 
items in a test is meaningful only to the extent of one’s knowledge 


ROBERT L. EBEL 17 


of the test items. To know that a student answered more questions 
correctly than 67 per cent of a group of students is likewise mean- 
ingful only to the extent of one's knowledge of the group of students. 
It is no more reasonable to assume that a test score can be inter- 
preted adequately in the absence of knowledge of the test itself than 
to assume that it can be interpreted adequately in the absence of 
normative data. Both are essential. 

The fact remains that normative standard scores are currently 
far more popular than content standard scores. The history of edu- 
cational measurement suggests that the popularity of normative 
standard scores was greatly encouraged by the apparent instability 
of raw content scores on objective tests. When subjectively scored 
essay tests were used, the scorer could correct for errors in esti- 
mating the probable difficulty of the questions, easing up in his 
scoring of questions which turn out to be too difficult and becoming 
more particular in scoring the questions which seem too easy. But 
with objective test questions the die was cast when the question was 
written. To have insisted on a minimum passing score of 70 per cent 
correct responses to objective test questions, or any other predeter- 
mined per cent, was obviously unreasonable in the face of evidence 
such as that reported by Monroe (1918) on the unreliability of 
teacher's estimates of the difficulty of essentially objective mathe- 
matics test items. It was also recognized quite early in the develop- 
ment of objective tests that a test on which the mean score was 85 
per cent would probably be far less efficient in differentiating levels 
of achievement than a test on which the mean score was nearer 50 
per cent. 

Hence, with the introduction and widespread use of objective 
tests, content standard scores expressed as per cents of correct re- 
sponse lost favor. Travers (1955) speaks of “the famous but dis- 
credited system of converting raw scores into percentages . . .” and 
characterizes it as “. . . another attempt on the part of the teacher 
to convert a relatively meaningless score into one which has some 
commonly accepted meaning.” The alternative was to transform 
the unstable raw score into a standard score related to the actual 
performances of students in some norm group. This alternative has 
become extremely popular. Several methods of transformation, and 
a wide variety of normative standard score scales, have been de- 
veloped. At present the use of normative standard scores as the 


18 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


basis for the interpretation of standard test scores is almost uni- 
versal in the United States. 

Unfortunately, something important tends to get lost when raw 
scores are transformed into normative standard scores. What gets 
lost is a meaningful relation between the score on the test and the 
character of the performance it is supposed to measure. It is not very 
useful to know that Johnny is superior to 84 per cent of his peers 
unless we know what it is that he can do better than they, and just 
how well he can do it! The very first sentence in John Flanagan’s 
chapter (1950), “Units, Scores and Norms,” in Educational Meas- 
urement makes this point. “Test scores are meaningful and valuable 
to the extent that they can be interpreted in terms of capacities, 
abilities and accomplishments of educational significance.” Later in 
the chapter this comment is added: “The raw score is a very funda- 
mental piece of information, and should not be relinquished in favor 
of some other type of score without good reason.” Rulon, and others, 
have also stressed this point. It is unfortunate, I think, that some 
specialists in measuring educational achievement have seemed to 
imply that knowing how many of his peers a student can excel is 
more important than knowing what he can do to excel them. Note 
that we are not here objecting to invidious comparisons nor sup- 
porting the allegation of psycho-social harm in competition. Our 
point is that when comparative scores are not clearly related to 
specific achievements they tend to have rather limited meaning and 
educational value. 

To be meaningful any test scores must be related to test content 
as well as to the scores of other examinees. Most widely used stand- 
ard test scores can be interpreted in terms of test content only in 
the most general way. If raw Scores are reported, and if a copy of 
the test is available, the Meaning of a score in terms of “capacities, 
abilities, and accomplishments” can be determined, But raw scores 
often are not reported and copies of the test are not often available. 
I am persuaded that the usefulness of educational measurements is 
seriously limited by the prevailing neglect of content-meaningful 
test scores. In the remainder of this paper I would like to suggest 
two ways in which I believe test scores having content-meaning can 
be secured. 

The first involves the use of “scale books" of selected items. Fig- 
ure 1 displays a selection of ten items representing the content of 


ROBERT L. EBEL 19 


the Mathematies Section of the 1959 Preliminary Scholastie Apti- 
tude Test, Form HPT2, 

As a first step in selecting these ten, all 50 items in the test were 
classified by inspection and judgment in the content categories shown 
in Table 1. The items are identified by their numbers in Form HPT2. 


Figure 1. Ten Items Representing PSAT Mathematics Test, Form HPT21 


74. M there are P girls and R boys in a class, what is the 93. Ifa boy earns $28 and spends 2 as much as ho saves, 
ratio of the number of. girls to the total number of boys how much does he save? 
Md a a td olani? A530 BST CS4 0.96 
,PB-R RT P. R Е. $21 
A PFR ur: ск Pp 
PLR 
x P 104. Par on a certain 18-hole golf course is 75, If a golfer 


wants his score to equal par and has a score of 42 on 
the first 9 holes of this course, how much Jess must, 
he average per hole on the last 9 holes than he aver- 
aged per hole on the first 9 holes? 

‚77. A certain test of 100 problems is scored by subtracting 
from 100 one point for each problem not answered ai ELS 
and 2 points for each problem answered incorrectly, 
If & pupil does not answer $ of the Problems, what is 
the greatest number of problems he can. have wrong 
and still get a score of 20 on this test? 


A. 20 B, 30 С. 35 D. 40 Е. 60 


во. $ +h = % 
А. 7 5.8 c9 D. 28 E, 56 


105. In the figure above, each unit along the x-axle repre- 
sents } foot and each unit along the y-axis represents 


92. А square carpet with an aren of 169 square feet must $ foot. What is the area (in square feet) of A RST? 
have 2 feet cut off one of Из edges in order to be a 
Perfect fit for a rectangular bedroom floor, What is ^i k ct 02 ‘eg 


the area in square feet of the bedroom floor? 


^. 17 в. D] С, 143 D. 165 
E162 р 


“ire 5.5 — 60) 4. 110, for what value of t docs 
0? 


vx -92 5.0 €. 309 D. 40 


108, In the figure above, POQ is an equilateral triangle 


98. A square and ап equilateral triangle have equal pe- with each side equal to 12 inches. What is the рейт» 
"meters. What is the length of a side of the trlangle if eter (in inches) of the blackened portion? 
the 
ren of the square ls 9 7 A. 12 44. B, 12 + Gr C. 124r 
^3 в. 4 с, 6 [X £12 D.124 24” E. 12 4 3n 
- 


1 The answer key for the Ten Items Representing PSAT Mathematics Test, Form HPT2* 
Item 74 77 60 82 844.88 93 104 105 108 
Kec BUE eo Do рЫ ЖО een 


20 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 
Classification of PSAT Mathematics Items 


Category Item Numbers 
ee 22 
1. Calculations with fractions 66, 73, 80, 91 
2. Verbal problems 67, 69, 77, 85, 87, 89 
3. Percentage and statistics 68, 71, 98, 104 
4. Problems in algebra 70, 72, 75, 78, 84, 95, 100 
5. Algebraic formulation 74, 86, 97, 102, 110, 111 
6. Directions and analytic geometry 76, 79, 105, 106, 109 
7. Problems in fractions 81, 83, 98, 107, 113 
8. Areas and volumes 82, 90, 94 
9. Triangles 88, 92, 96, 99, 112 
10. Circles 101, 103, 108, 114, 115 


TS ВАА uem SL PLUS «Н 


Next a measure of the discriminating power of each item was ob- 
tained by subtracting the proportion of correct response in a low 
scoring group (100 students whose scaled score on the Mathematics 
test was below 300?) from the corresponding proportion in a group 
whose scaled scores were above 700, The item in each category which 
showed the highest discriminating power was included in the repre- 
sentative set. We then scored the ten selected items on six sets of 
100 aiiswer sheets which had standard scores near 750, 650, 550, 450, 
350, and 250, respectively. The results are shown in Table 2. 

кыр, 


2 TABLE 2 
Frequency Distributions of Scores by Standard Score Level on 
DEW, Ды the Ten Representative Items 
eee 
Score on Standard Score Level 
10 Items 750 650 550 450 350 250 
Ge yo ee DE eee 
10 40 13 1 
9 41 30 
8 16 33 9 
7 3 15 28 
6 8 29 5 1 
5 1 19 15 1 
4 13 24 4 
3 1 28 15 
2 20 30 1 
1 8 32 27 


?'These scaled scores have a mean of a 


Thi 1 proximately 500 and a standard 
deviation of approximately 100. y if і 


ROBERT L. EBEL 21 


On the basis of this table it is possible to say that in these samples 
of answer sheets the most frequent number of correct answers to the 
ten representative items was 9 for those whose scaled scores were 
near 750, 8 for those whose scaled scores were near 650, and so on. 
Smoothing and interpolating lead to the estimates shown in Table 3. 

It is our belief that this set of items, together with Tables 2 and 
3, will contribute to a more informative answer to the question, 
“What does a score of 600 on Scholastic Aptitude Test mathe- 
maties mean?" than has been available previously. It seems rea- 
sonable to believe that equally useful interpretive data could be 
obtained for verbal scores on the Preliminary Scholastie Aptitude 
Test, and for other tests as well. 


TABLE 3 


Most Probable Scores on Sample Items for Examinees Receiving 
Various Scaled Scores on the Entire Test 


Scaled Score Score on Sample Items 


The second, and possibly more basie way to secure test scores 
which have content-meaning is to build the meaning into the test, 
and hence into the test score, by systematic, explicitly specified 
Processes of test construction. 

Specialists in educational measurement generally recognize that 
most objective tests rest on highly subjective foundations. The 
abilities, values, and idiosyncrasies of the test constructor have 
Played a major part in determining the contents of most tests. Test 
Specifications sometimes exist only in the mind of the test con- 
Structor, or in a few brief written guidelines. When written they 
often have more to say about the form of the test than about its 
content. Seldom are the test specifications sufficiently explicit and 


22 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


so comprehensive that competent test constructors, working inde- 
pendently, eould be expected to produce forms on which raw scores 
are essentially equivalent. The processes of test construction often 
appear to have more in common with artistic creation than with 
scientific measurement! 

Tn this respect, educational tests are distinctly different from most 
physical, chemical, or biological tests and measurements. In those 
more scientific fields, carefully specified measurement operations 
are designed to yield highly consistent results, almost regardless of 
the operator. The quantitative sophistication of many specialists in 
educational measurement is displayed, not in the precision and 
elegance of their procedures for obtaining initial measurements, but 
rather in the statistical transformations, elaborations, and analyses 
they are prepared to perform on almost any raw data given them. 
The term “raw” may be particularly appropriate when applied to 
the original data yielded by many educational tests. What we often 
overlook is the limited power of statistical transformations to refine 
these raw data and make them more precisely meaningful. If more 
systematic and standardized processes of test production could be 
developed and used, our educational measurements should become 
not only more consistently reproducible, but what is perhaps even 
more important, they should become more meaningful. 

Bridgman and others have argued that the meaning of a quanti- 
tative concept may be well defined by specifying the operations 
used to measure the quantity involved. Granting this, it follows 
logically that tests of educational traits should yield more meaning- 
ful scores if the operations by which they were produced could be 
stated more explicitly. A test produced by objectively defined 
processes may be less efficient, or lack some kinds of excellence 
which a creatively artistic test constructor might achieve, but the 
increase in objective meaningfulness and reproducibility could more 
than offset the cost. 

At this point it may be desirable to illustrate what is meant by 
an objectively defined process of test construction, and to demon- 
strate that a useful test of at least one educational achievement can 
be constructed by such a process. It seemed to make sense to begin 
with something simple—a test of knowledge of word meanings 
Parallel forms of the test were produced, one by a test specialist and 
the other by an intelligent secretary who had no special training in 


ROBERT L. EBEL 


Figure 2. TEST OF KNOWLEDGE OF WORD MEANINGS! 


Directions to the Examinee: First read all the words in the List of Words. Then 
look at the first definitional phrase, identified by the number “1” in the List of 
Definitions. If you recognize it as the definition of one of the words, write the 
number “1” on the line to the left of that word. If not, make no marks. In the 
same way match each of the other definitions, if you can, with the appropriate 
word, writing its number opposite the word it matches. 


List of Words 
1. —  aroint 


. —— overlay 
18. —_ plumb 
14. — rajah 

. ——— sangaree 
. — smew 


sunken 


. — toed 
. —— uplift 
— whither 


1. 


4, 
5. 


. To improve the condition of, especially 


List of Definitions 
A huge mythical manlike or monstrous being of 
more than mortal but less than Godlike power and 
endowment. 


. A large solid-hoofed herbivorous mammal domesti- 


cated by man since a prehistoric period, used as & 
beast of burden, a draft animal or for riding. 


. A liquid of camphor-like odor. 


A lively Spanish dance, or a tune in its rhythm. 
A point of intersection. 


. A small quantity. Р 
. A tropical drink of wine, water and sometimes 


brandy, sweetened and spiced. 


. A weight of lead attached to a line and used by 


builders, etc. to indicate a vertical direction. 


. Cloudy. 

. Driven obliquely, as a nail. 

. Empty talk. 

. Lying on the bottom of a river or other water. 

. The money, goods, or estate which a woman brings 


to her husband in marriage. _ f 
Title of an Indian king, prince or chief, or of a 


Malay or Javanese ruler. 
4 mentally 


or emotionally. 


. To superimpose or cover. 
. A merganser (Mergus albellus) of northern Europe 


and Asia, white crested in the male 


. Begone. 2 
. The dried leaflets of a shrub of the rue family. 


. To what place. 


+ The answer key for the sample TEST OF KNOWLEDGE OF WORD MEANINGS: 


phim 


6. 4 п. 6 16. 17 
17. 1 12. 16 17. 12 
8. 2 13, 8 18. 10 
9. 19 14, 14 19. 15 
10. 9 15. 7 20. 20 


24 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


test construction. Both tests were built on the basis of detailed 
written specifications and directions. The tests were based on a 
spaced sample of 100 words from a specified dictionary. Explicit 
instructions were given for choosing a unique but representative 
sample, and for limiting the sample to words appropriate for the test. 
For each word the first synonym or defining phrase was copied from 
the dictionary. 

The words were arranged in alphabetical order in a single list. 
The defining phrases were also placed in alphabetieal order and 
numbered from 1 to 100. The student's task was to match the 
definitional phrase with the appropriate word. A twenty-item ex- 
ample of such a test is given in Figure 2, З 

The two forms of the test were administered to 30 graduate stu-- 
dents in a course on education evaluation at the University of 


Southern California in August, 1960. Half of the students took Form — 


A first; the other half took Form B first. Their scoros yielded the 
test analysis data shown in Table 4. 

The difference between Forms A and B in statistics of both stu- 
dent scores and item difficulty values may seem large for tests which 
purport to be equivalent. But the differences turn out to be well 
within expected limits of sampling error in terms of the wide range 
of difficulty in the domain of tasks from which the samples were 
drawn. The magnitude of these sampling fluctuations, however, 
suggests that tests of far more than one hundred items would be re- 
quired to yield reasonably equivalent scores from alternate forms. 

These tests constitute one operational definition of the proportion 


TABLE 4 
Analysis Data for Two Forms of an Objective Test of Word Knowledge 


Form A Form B 
1. Prudent Scores 
ean 37. 
Standard Deviation us 45 
Reliability “92 95 Ё 
Intercorrelation 86 К 
II. Item Difficulty i 

Mean 11.19 

à * 9.84 
Variance 79.34 72.02 
Standard Error of Diff. 1.24 ; 

t 1.09 


ROBERT L. EBEL 25 


of words in a certain dictionary for whieh a person "knows" the 
meaning, and hence of the size of his vocabulary in a certain sense. 
No doubt better operational definitions, and more acceptable esti- 
mates of vocabulary size, are possible. But whatever their limita- 
tions, the raw scores on these tests are not meaningless. On the con- 
trary, they are objectively meaningful. 

Systematic test construction, based on explicitly prescribed opera- 
tions, will not be so simple in other areas of educational achieve- 
ment, but there is no apparent reason why it should be impossibly 
difficult or unrewarding. While a systematically constructed test is 
not likely to be as efficient as one constructed by an expert, or one 
built with the help of item tryout data, this problem might be cir- 
cumvented also. It is quite conceivable that systematically con- 
structed tests could become the criteria against which more efficient 
operational tests are calibrated. 

In this presentation our purpose has been to emphasize the need 
for and to demonstrate the possibility of test scores which report 
what the examinee can do. Content-meaning in test scores supple- 
ments but does not replace normative meaning. Both kinds are 
essential, in our view. The more simply, directly, and clearly they 
ean be presented, the more useful and educationally fruitful our 
tests are likely to be. 


REFERENCES 


Flanagan, John C. “Units, Scores and Norms.” In Educational Meas- 
urement (E. F. Lindquist, Editor). Washington, D. C.: American 
Council on Education, 1950. 1 

Monroe, Walter Scott. Measuring the Results of Teaching. Boston: 
Houghton Mifflin Company, 1918. 

Noll, Victor H. Introduction to Educational Measurement. Boston: 
Houghton Mifflin Company, 1957. 

Stanley, Julian C. and Ross, С. C. Measurement in Today's Schools. 
New York: Prentice Hall, 1954. 

Thorndike, Robert L. and Hagen, Elizabeth. Measurement and 
Evaluation in Psychology and Education. New York: John 
Wiley and Sons, Inc., 1955. 

Travers, Robert M. W. Educational Measurement. New York: The 
Maemillan Company, 1955. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


SCALES WITH NONMEANINGFUL ORIGINS AND 
UNITS OF MEASUREMENT 


WILLIAM H. ANGOFF 
Educational Testing Service 


Tux three speakers before me have all made the point that scores 
should have the characteristic of yielding direct interpretive in- 
formation; and two of the speakers, Dr. Gardner and Dr. Ebel, have 
quoted Dr. Flanagan as sharing their general viewpoint. That seems 
to make it unanimous, and judging from the title of the present 
paper, leaves me with the only dissenting opinion. However, I 
would like to make it elear that there is only one point with which 
I would take issue with the other members of the panel: whether 
meaning should be incorporated into the scores directly by the test 
publisher or whether meaning should more properly be brought to 
the scores directly by the user himself and indirectly by the pub- 
lisher. 

If we were to examine the various reasons that are given for pre- 
ferring systems of derived score scales for standardized tests rather 
than the original raw score scales, we might find that the reasons 
fall into about four categories: 

One, for the sake of convenience in handling test score data, it is 
frequently desirable to convert raw scores to scales with pre-as- 
signed characteristics in round numbers that are easy to recall and 
easy to use. The stanine scale is a good example of a scale that 
Possesses this characteristic, as is the IQ scale, the 50-10 scale, and 
others, 

Two, it is frequently maintained that the original raw score scale 
of а test is no more than an ordinal scale and cannot be used, for 
example, to compare score changes in different regions of the scale. 


27 


28 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


In an effort to make comparisons of this sort possible, raw score. 
scales are converted to derived scales in which the unit separations 
between scores are in some sense equal. Dr. Flanagan’s Scaled Score 
System and Dr. Gardner’s K-Scores are derived scales of this type. 

Three, derived scales are used when more than one form of a test 
is available and the forms are used interchangeably. In such in- 
stances it is considered desirable to equate the forms in order to 
make the reported scores independent of the form used to obtain 
them. It is also considered desirable to report scores on a scale which 
is different from the raw score scale of any form. The derived scale, 
then, exists as a referent for all test forms, which are interrelated 
through а process of equating. This process accomplishes the result 
that, within the limits of random error, the reported score earned by 
an examinee would be the same, irrespective of the particular test 
form which actually yielded the score. The College Board Scale is 
one of a number of scale systems that purport to relate test forms 
in this way. 

Four, it is frequently maintained that the raw score scale yields 
little or no immediate meaning of its own. For that reason, derived 
Score scales are established in which normative meaning is directly 
incorporated. The simplest process by which this is accomplished 
may be described as consisting of the following steps: The test 
forms are administered to a random or representative sample of a 
defined population, one whose characteristics are presumably well 
known. The raw scores for the sample are then collected, pertinent 
statistics are drawn up, and a conversion applied, either to yield a 
mean and standard deviation with certain pre-assigned numerical 
values, or to yield a particular distribution-form with certain pre- 
assigned numerical values, This scale is then said to have normative 
meaning because the knowledge of any derived score yields im- 
mediate evaluative knowledge of test performance in comparison 
with the members of a known population. MeCall's T-Scores repre- 
sent a scale of this type, as do many others. 

Regarding each of these four characteristics that test constructors 
consider for their test scales, there appears to be no problem, first of 
all, in connection with the characteristic of convenience. Unless 
there were some overriding consideration, it is difficult to imagine 
why one would choose to assign a number like 81.27 as the mean, 
for example, of a distribution of derived scores rather than a more 


WILLIAM H. ANGOFF 2 


convenient one like 50 or 100. Secondly, as far as equality of units 
is concerned, there is no question about the importance of this 
problem in measurement. There appears to be no issue here either. 
With regard to the question of form-to-form equating, it seems 
not only reasonable but essential, in a continuing testing program 
where new test forms are frequently introduced, or in a system of 
test offerings where more than one form of a test exists, that some 
means be provided to report scores on a single scale independent of 
test form. A system of this sort, which can exist only as a result of 
form-to-form equating, makes it possible to ensure that within the 
limits of equating error, an examinee’s reported score will be un- 
biased by the form he happened to take. It also ensures that bver 
the course of time, irrespective of the introduction of new forms and 
the abandonment of old forms, the scale will continue to stand and 
will continue to yield interpretable data comparable with data col- 
lected at earlier times. The characteristic of scale constancy makes 
it possible to observe, among other things, shifts and changes in the 
ability of the groups tested over a period of time. y 
With regard to the normative characteristic of the scale, there 1з, 
it seems to me, some serious question whether this is always essential 
or even always desirable. The definition and construetion of this 
type of scale presupposes that there exists а population which is 
sufficiently unique among all those possible to warrant its choice as 
the referent population for the system of tests and scores under 
consideration. This is questionable. There are usually many popula- 


tions that can be used for this purpose. What frequently happens in 


Sealing the test is that the group chosen to form the normative basis 
for the scale is a very general one, often too general to use fon 
Specific score interpretation. In order, then, to provide data for 
specifie decisions such as for guidance, placement, OT remedial 
education—additional interpretive aids have to be devised—for 
example, differentiated norms, local norms, regression equations, 
and content interpretations—and the group that was originally in- 
corporated into the scale to give it normative meaning often goes 
unused—or, perhaps, is used when it shouldn’t be. 

This would lead us to ask: What kind of seale should take the 
Place of the normative scale? The answer to this is particularly diffi- 
cult to give because it appears to take away and not to give in re- 
turn. What is suggested here is a non-normative scale—a scale that 


30 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


has no normative meaning at all. The mechanics of defining the 
scale of numbers can be as simple as one wishes. If the conversion is 
to be a linear one, one can set the minimum and maximum scores on 
one of the forms at desired scale values, say 20 and 80, and the scale 
for the system of equated test forms, when such a system exists, is 
automatically defined. Or, one can choose some convenient group of 
examinees who are not seriously atypical of those who will be ex- 
pected to take the tests in the future, and set the mean and standard 
deviation of this group equal to a pair of convenient numbers in 
order to transform the raw scores, This group, it should be em- 
phasized, is only a conveniently available group. It need not possess 
special normative characteristics; indeed, it may be a group that 
would not ordinarily be used for normative interpretation. If the 
scale is to be so defined that unit separations are to be made equal 
in some sense, then this can be done by adjusting the raw score in- 
tervals in the desired fashion. This system of adjusted raw score 
units can then be translated to à new, more convenient system in 
direct linear fashion in either of the ways described above, The final 
derived scale would still, of course, retain the relationships among 
the interpoint distances of the desired interval scale, 

The significant point here is simply that it does not matter how 
the number system is originally defined. The scale can be referred 
to as à general range of numbers, one that exists without inherent 
meaning and one that serves only as a referent or vehicle for equat- 


ing the system of test forms When such a system of test forms is in 
operation, 


of the alternative, the normative scale, It may therefore be helpful 
to examine these problems. 


WILLIAM H. ANGOFF 31 


obsolete in ten or twenty years and that the normative scale would 
exist without normative meaning. At that time the real dilemma 
would have to be faced: whether to maintain the scale and abandon 
any pretense of current normative meaning, or to redefine the scale 
in terms of a currently meaningful group and give up automatic 
continuity with the past. Personally I see little hope of a satisfactory 
compromise here. 

The point is made here that it is the passage of time and the 
changes brought on with the passage of time that test the usefulness 
of a seale. And changes do occur—sometimes dramatic ones. In a 
1948 article in the American Psychologist, Tuddenham reported 
that when the Wells Revision of the Army Alpha was administered 
to a sample of World War II enlisted men it was found that their 
median score of 104 fell at the 83rd percentile for the soldiers tested 
in World War I with the original Army Alpha. The World War I 
median of 62 corresponded to the 22nd percentile of the World War 
II soldiers. Even allowing for the small differences in difficulty in 
the two tests, the difference in these norms is still striking. Tudden- 
ham attributes the change to increased familiarity with objective 
tests and, even more, to superior educational opportunity in the 
1940's. Whatever the reasons are, there is no question that popula- 
tions can and do change with time. Test scales must be built to be 
adaptive to the change. 

For purposes of illustration I would like to consider a testing 
Program like the Scholastic Aptitude Test Program of the College 
Board. Here is a program in which there are a number of extant 
forms of, say, the Verbal Test in current use, all interrelated on a 
single scale which is maintained through a system of equating, going 
back to the time when the scale was first established. This scale, still 
in use today, was originally defined as one which yielded a mean of 
500 and standard deviation of 100 for the group of candidates who 
Were tested in April, 1941, a group of 10,766 candidates applying to 
one or more of the 45 colleges who constituted the membership of 
the College Board at that time. In the academic year 1959-60 the 
Comparable number of College Board candidates had risen to over 
566,000—approximately thirty times the number tested in the entire 
year 1941. The number of member colleges of the College Board has 
also grown—from 45 in 1941 to 287 in 1960. The point does not have 
to be made any more emphatically than is already made by these 


32 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


figures that the last twenty years have brought about some con- 
siderable changes in the character of the College Board applicant 
group, changes which, it seems to me, render the original group 
inappropriate as a normative group today. Indeed, there is some 
serious question in this instance whether the scale ever had norma- 
tive meaning, since the group was not defined for its highly mean- 
ingful normative value, but simply because, you might say, it was 
there, and there was no other immediately available. In any case, 

tive or not at the time of its definition, it most certainly does 
not have any normative meaning today. Yet it is very significant 
that, in spite of the absence of inherent normative meaning, the scale 
does have meaning of another kind, meaning that is provided in the 
normative data published by the College Board, and even more, 
meaning that is acquired in the minds of the test users in dealing with 
the scores over the course of time. Obviously, then, while a scale can 
derive one kind of meaning from its definition, it ean also derive 
meaning from the experience that the user acquires in applying the 
scale to the measurement of familiar objects. Therefore, these princi- 


ples can be stated here: One, that the meaning that is invested in 4 


scale at the time of its definition is not lasting; indeed, there is some 
question whether it is useful. The real meaning in a scale is the mean- 
ing given to it by the user over a period of time with experience and 
familiarity and with normative aids. Two, that a scale has a rea- 
sonable chance of being meaningful to the user if it does not change. 
For both of these principles an analogy taken from everyday” 
Measurement is helpful, it seems to me. There is hardly a person 
here who knows the precise original definition of the length of the 
foot used in the measurement of height or distance, or which king 
P Whose foot was originally agreed upon as the standard; on the 

ет hand, there is no one here who does not know how to evaluate 
lengths and distances in terms of this unit. Our ignorance of the 
precise original meaning or derivation of the foot does not lessen its 
usefulness to us in any way. Its usefulness derives from the fact that 
it remains the same over time and allows us to familiarize ourselves 
with it. Needless to say, precisely the same considerations apply to 
other units of measurement—the inch, the mile, the degree of 
Fahrenheit, and so on. In the field of psychological measurement 


* 
L 


it is similarly reasonable to say that the original definition of the . 
scale is or should be of no consequence, What is of consequence 18 


WILLIAM H. ANGOFF 33 


e maintenance of a constant scale—which, in the case of a multi- 
form testing program, is achieved by rigorous form-to-form 
equating—and the provision of supplementary normative data to 
aid in interpretation and in the formation of specific decisions, data 
— Which would be revised from time to time as conditions warrant. 
Let us suppose that a suitable norms population had been chosen 
- for the College Board Program in 1941. How suitable would that 
population be today, not only because the numbers of candidates 
are so much greater today than they were then, but also because 
qualitatively they are different kinds of people? And it is entirely 
| proper to ask this question, even though the mean performance of all 
candidates tested today is not much different from the mean per- 
formance of all candidates tested in 1941. 
Suppose further that the candidate population changes, not only 
in qualitative characteristics but also in level or dispersion of per- 
Коал. Should the scale be redefined in terms of a normative 
group relevant to current examinees, or should the scale be retained 
| and simply defined as one which does not yield automatic normative 
information, in much the same way as the commonly used physical 
Units are defined? Even more significant than the considerations 
discussed thus far is the fact that if the scale is redefined in terms of 
appropriate current normative group, then the act of altering 
the scale through redefinition would necessarily bring about the loss 
of that very characteristic of the scale that made it possible to 

Serve the change in the group’s relevance in the first place—the 

“characteristic of the constancy or continuity of the scale. It would 
Seem that the appropriate decision here is to retain the existing scale, 
however one cares to regard its meaning or lack of it, and direct the 
“larger effort toward maintaining and improving a stable and "uu 
‘tinuing Score reporting system. 

The definition of the number system for a test scale is а commit- 
Ment for the future. If a single normative group is available and if it 
В; meaningful above all others for making score interpretations, and 
if future populations of examinees are not expected to differ in any 
Significant way from the population tested today, then the test scale 
can be defined in a normative fashion. If this is not the case—and 
quite often it will not be—then the scale of numbers should be 
‘chosen in the most arbitrary way possible and then quickly forgot- 
ten, Tf the test constructor feels that this process of scale definition 


| 


34 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


will deprive it of meaning, he is unduly pessimistic. Meaning will 
come—but only through his efforts to provide a variety of current 


and useful norms, and through the experience of the users of his | 
tests. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


DISCUSSION 


JOHN C. FLANAGAN 
American Institute for Research and University of Pittsburgh 


I have found the three papers in this symposium to be excellent 
and stimulating discussions, Each emphasizes somewhat different 
Points of view, but all appear to start from a common agreement on 
the fundamental properties of aptitude and achievement standard 
Scores. The points of agreement are that in current practice few raw 
Scores have intrinsic meaning, and that meaning and measurement 
Properties would be improved by reporting scores on a scale having 
equality of units throughout the scale, Meaning can be built into 
the scores by obtaining a systematic sample of items for the test 
from a clearly defined domain of such tasks as emphasized by Ebel, 
Mentioned by Gardner, and agreed to by Angoff. Meaning can be 
given the scores by administering the test to a specifically defined 
Population of students and deriving standard scores with specific 
Points of reference in relation to the population and “equal units” 
obtained by a sealing procedure as outlined by Gardner. Finally, 
Meaning can be obtained for the scores by provision of various types 
of normative data and by the personal experience of the user. This 
is recommended essentially as a substitute for including normative 
data in the score scale by Angoff, and as a necessary supplement to 
other procedures by Gardner and Ebel. 

Y own view is that all three types of meaning are greatly 
Needed, Probably the most neglected in current practice, and there- 
°re worthy of special emphasis at this time, are content standard 
Scores of the type described by Ebel. The desirable property pro- 
Posed by Gardner, that items in the test form a Guttman Scale, 
Seems unworthy of mention in such a list since it is both impractical 


35 


36 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


and unnecessary. In terms of the specifications for almost all apti- 
tude and achievement tests, the requirement that we have a Gutt- | 
man Scale would, as Gardner points out, eliminate from the final | 
test important concepts and skills included in the specifications. The 
requirements for a Guttman Scale involve such a high degree of 
similarity in the items that they become so specific in relation to - 
measurements of culture-affected tasks that the scale represents 
only what would be a single item in a practical measuring instru- | 
ment. It is therefore my view that the concept of the Guttman 
Seale has practically no application in connection with aptitude 
and achievement tests. 

Iam in quite close agreement with the other points in Gardner's 
paper. 

In Angoff's paper, while agreeing with many of the properties of 
standard scores indicated as desirable by Gardner, the question is 
raised: “Whether meaning should be incorporated into the scores 
directly by the test publisher, or whether meaning should more 
properly be brought to the scores directly by the user himself and 
indirectly by the publisher.” It is concluded by Angoff that the scale | 
of numbers should be chosen in the most arbitrary way possible, 
and then quickly forgotten. 

The points he introduces in support of this view are 

1. The meaning invested in a scale at the time of its definition is 

not lasting. 

2. A scale has a reasonable chance of being meaningful to the 

user if it does not change. 
These appear to be good points and deserving of serious considera- 
tion. However, after due consideration, it is my belief that they 
should be rejected for general application to the development P 
standard scores for aptitude and achievement tests. 

When the system of Scaled Scores for the Cooperative Achieve- | 
ment Tests was developed in 1939, the basic reference point selected | 
for all tests was a score of 50, This score represented the score 
achieved on this test by persons of average ability (I.Q. of 100) 
who had had the usual amount of the subject in the specified grade 
and in a school having a quality of instruction typical of that 
throughout the United States. It was my hope at the time that thes? 
were developed that there would be enough change in content and 
enough improvement in the quality of instruction so that the test? 


JOHN C. FLANAGAN 37 


would have to be restandardized at least every ten years. It is my 
impression that there has not been sufficient improvement in sec- 
ondary school education during the past twenty years to introduce 
serious errors into the Scaled Scores of the Cooperative Achievement 
Tests. It is profoundly to be hoped, however, that the next ten 
years will witness such striking improvement that restandardization 
would be essential. During the period it would certainly be gratify- 
ing to retain a system of scores reflecting past norms of achievement 
and note the genuine progress being made in improving the quality 
of instruction. 

Perhaps the most serious defect of the use of arbitrary non-mean- 
ingful scales is in the lack of comparability for scores for different 
Courses or aptitudes. If one starts with the same arbitrary scale in 
all subjects and never changes, as Angoff proposes, improved meth- 
ods of teaching mathematics might result in moving average per- 
formance up ten points over a period of ten years in mathematics 
courses while improvement in language courses represented only 
about two points. A score of 58 would then mean something quite 
different for language and mathematics courses if they had been 
similar in meaning originally. Restandardizing every ten years 
would get comparability from subject to subject back into the 
standard scores at least every ten years. 

It is clear that most aptitude tests including those for academic 
aptitude, sometimes called intelligence tests, are greatly influenced 
by school achievement. It would seem very undesirable to have 
mental ages and I.Q.’s tied to the performance of students in 1916 
for an indefinite period. 

It also seems to me that the experience of the user is likely to be 
misleading. For example, if he gets used to thinking of 65 as an out- 
standing score and—because of improved instructional procedures 
—more than 90 per cent of persons throughout the country are ex- 
ceeding this score, he may find himself retaining his earlier concept 
of the meaning of this score when it no longer has this significance. 

The term content standard test score as used by Ebel is a num- 
ber that indicates the per cent of a systematic sample from a de- 
fined domain of tasks which an individual has performed success- 
fully. The first of the two examples given by Ebel, indicating the 
most probable number of a group of ten sample items answered 
Correctly by students obtaining various scaled scores on the entire 


38 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


test, provides a useful short-cut for interpreting a standard score im 
terms of actual test items. The second example, in which tests wen 
built on the basis of the detailed written specifications regardin| 
the spaced sampling of 100 words from a specified dictionary, i 
dicates the proportion of the words in this dictionary for which 
first synonym listed is known by the student. 

Although various types of content scores have been discussed for 
many years, only a few such scores have been developed. One ex: 
ample is the early C-Scales, which were prepared in such a way 
that a person could note the individual's score and look directl; 
at a set of scaled items to see the difficulty of the items which repre 
sented his ability level. Another example is Robert Seasho е" 
Vocabulary Test based on a random sample of words from thi 
dictionary published about twenty-five years ago. In 1939, 1947 
and 1950, the present author made recommendations regarding con: 
tent scores. It is not clear whether the very slow progress in 
veloping content scores is due to the difficulties involved or the fa 
that no one seemed to need them. 

In connection with the development of the tests for Project 
TALENT, an effort was made to select items in such a way as М 
maximize interpretations in terms of content. For example, in 
section on capitalization, each of the rules regarding capitalization 
was used exactly twice. Similarly, items in literature were selected 
in such a way that a statement could be made regarding the porti 
of the usually recommended books whieh the individual had re 
In the three areas of spelling, vocabulary, and reading, it was be 
lieved that tests representing systematic samples of domains would 
be inefficient for use in a large testing program such as this. There 
fore, a special spelling test was prepared including a sample 9 
three hundred words selected from the five thousand most fre 
quently used words according to a systematie random procedure 
This test was given along with the shorter multiple-choice test 0 
spelling used in Project TALENT to determine the proportion ¢ 
words in the 5000-word list which a student making a partie 
score on the Project TALENT spelling test could be expected to 8 
correctly. A similar sampling of word meanings was made fr 
a group of dictionaries. This test has been equated to the vocabu 
lary test in Project TALENT so that statements can be made 1€ 
garding the number of word meanings known to high school student 


JOHN C. FLANAGAN 39 


To provide interpretation for scores on the Project TALENT 
reading test in terms of meaningful content, this test was equated 
to random samples of reading content from authors judged to repre- 
sent various levels of difficulty, beginning with Louisa Mae Alcott 
and going up to Dostoevski and Thomas Mann. In a similar fashion, 
paragraphs were selected from magazines using certain movie maga- 
zines to represent the simplest levels and the Atlantic Monthly and 
the Saturday Review of Literature at the top level, with several 
other magazines in intermediate positions. It is hoped that experi- 
ence in relation to these studies and reports will popularize content 
Scores for use in interpreting aptitude and achievement test scores. 

In conclusion, I am very glad to have had the opportunity to 
Participate in this symposium and I feel that each of the partici- 
pants has raised important points with respect to the function of 
Scores in assisting in the interpretation of test results. I am sure 
that the technicians have much more work to do if today’s test 
users are to derive complete and correct meaning from the test 
Scores of their students. 


EDUCATIONAL, AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


RESPONSE STYLE AND CONTENT MEASURES 
FROM PERSONALITY INVENTORIES! 


SAMUEL MESSICK 
Educational Testing Service 


Tz pervasive influence of reliable response sets or stylistic re- 
sponse consistencies has been frequently noted on a variety of per- 
sonality and attitude scales (cf, Cronbach, 1950; Edwards, 1957; 
Fricke, 1956; Jackson & Messick, 1958; Messick & Jackson, 1958). 
Recent research has emphasized two major response styles in par- 
ticular which appear to determine a considerable proportion of 
Tesponse variance on personality questionnaires—namely, the tend- 
ency to respond in a desirable or undesirable manner and, for 
inventories having a true-false or agree-disagree format, the tend- 
eney to acquiesce (cf. Couch & Keniston, 1960; Jackson & Mes- 
sick, 1960; Messick & Jackson, 1961; Wiggins & Rumrill, 1959). On 
one hand these response sets are confounded with legitimate replies 
to item content and hence introduce errors of interpretation into 
the logical validity of seales (Cronbach, 1946; Edwards, 1957), but 
on the other hand response styles are also stable and reliable com- 
Ponents of performance, which may reflect consistent individual 
styles or personality traits (Berg, 1955; Frederiksen & Messick, 
1959; Jackson & Messick, 1958). 

These two aspects of response sets—as a source of error variance 
in content scores and as a source of reliable variance for stylistic 
ioe ae 


. "The author wishes to thank Dr. Douglas N. Jackson for his cooperation 
In making testing arrangements and in obtaining subjects, for his help in 
administering the questionnaires, and for his thoughtful comments throughout 
the course of the study, Grateful acknowledgment is also due to Dr. Ledyard 

Tucker for his helpful advice on the analysis and to Miss Henrictta 
Gallagher for supervising the computations. 


41 


42 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


personality measures—produce dual problems of measurement and 
control, and separate solutions to these problems of measuring re- 
sponse styles and of controlling their influence on content scales 
may not be wholly satisfactory from the viewpoint of substantive 
personality assessment. The present paper attempts to review sev- 
eral suggested procedures for controlling response set effects on 
personality scales and to evaluate their efficacy, not only in pro- 
ducing adequate content measurement, but in properly reflecting 
stylistic personality variance and content-style interrelationships. 
A factor analytic procedure for the joint measurement and control 
of response styles will also be illustrated and evaluated empirically 
as a possible means of representing both stylistic and content vari- 
ance within the same analytical framework. 

One solution to the control problem, for example, would involve 
the development of independent measures of the response sets of 
acquiescence and desirability, which could then be statistically 
partialed out of content scores (Webster, 1958, 1959). However, 
such a partial correlation approach produces corrected content scales 
which are uncorrelated with the response styles in question, a de- 
sirable feature for control purposes but a severe limitation to be 
arbitrarily placed upon the description of personality relationships. 
Since reliable tendencies to acquiesce or to respond desirably may 
in themselves reflect certain needs or traits in the respondent, some 
legitimate correlations might be expected between measures of re- 
sponse style and “pure” content scales. Such a correction procedure 
may be satisfactory when primary emphasis is upon the prediction 
of criteria, but it would appear to restrict unduly the empirical 
emergence of stylistic and content interrelationships. 

Another technique frequently suggested for the partial control of 
response biases is the use of balanced scoring keys (cf. Messick & 
Jackson, 1958), whereby each content scale is constructed to have 
an equal number of true and false items. This scale construction 
rule is intended primarily as a control for acquiescence, although a 
similar recommendation might also be made for equal numbers of 
desirable and undesirable items (Hand & Reynolds, 1961). Although 
2 more precise cancellation of aequiescence would be effected by 
equating the amounts of variance contributed by true and false 
items to the underlying content dimension on each scale, such à 
nuance could be achieved only with considerable labor, if at all. A 


SAMUEL MESSICK 43 


simple balancing of the number of true and false items, however, 
appears to provide important, but not always sufficient, insurance 
against. the spurious influence of acquiescent response set upon 
total scale scores—provided, of course, that equally meaningful, 
unambiguous, and compelling items can be written in both direc- 
tions. Correlations between such balanced scales and independent 
measures of acquiescence can thus be less equivocally interpreted 
in terms of content variables associated with agreement tendencies 
(Couch & Keniston, 1960). It is with respect to controlling for de- 
sirability bias, however, that the search for balanced keys appears 
to be a much less effective general strategy, primarily because item 
desirability is usually confounded with content scoring for desir- 
able and undesirable traits, with the possible exception of a narrow 
range of neutral traits and perhaps of scales for which “subtle” 
items are available. Thus, items keyed “true” as content measures 
of a desirable trait would also tend to be keyed “true” in terms of 
item desirability, so that a generalized tendeney to respond in a 
desirable direction would tend to produce spuriously high scores 
for desirable traits and spuriously low scores for undesirable traits. 
Because of this partial confounding of desirability and content 
scoring, the set to respond desirably also does not appear amenable 
to such procedures as those suggested by Helmstadter (1957) or 
Chapman and Bock (1958) for estimating separate set and content 
components of scale scores, although these models do appear 
promising for appraising the influence of aequiescent response set. 

Another approach to controlling response sets involves the use 
of a forced-choice item format, such as that employed by Edwards 
(1957) in developing the Personal Preference Schedule. With this 
technique, statements are presented to the subject in pairs, the 
members of each pair haying been previously selected to be as 
equal as possible in average judged desirability, and the respondent 
is required to select from each pair that statement which is more 
descriptive of his personality. Such forced-choice items do not, offer 
the opportunity for simple agreement or acquiescence responses, 
and since the paired statements are also presumably matched in 
desirability, a consistent tendency to respond desirably should in 
principle have little effect upon item choices. Carefully equated 
forced-choice content scales thus offer considerable promise for the 
Measurement of content traits and, used in conjunction with in- 


44 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


dependent measures of acquiescence and desirability, for the evalu- 
ation of content-style interrelationships. 

Although forced-choice items effectively eliminate acquiescence, 
several experimental difficulties seriously limit the adequacy of their 
control for desirability bias. For one thing, it is difficult to equate 
the desirability of statements with sufficient precision by means of 
average ratings, or even in terms of such psychometric scaling tech- 
niques as successive intervals (Edwards, 1957), primarily because 
the number of categories provided on the rating scale is usually 
small in comparison with the number of items required for a per- 
sonality inventory. Apparently small differences in rated desirabil- 
ity are then magnified when the subject is subsequently asked to 
choose between statements presented in the sensitive item format 
of a pair comparison (cf. Corah, et al., 1958; Edwards, Wright & 
Lunneborg, 1959). For another thing, the use of a single dimension 
of average judged desirability for equating statements oversimpli- 
fies a multi-dimensional domain and ignores differing individual 
viewpoints about desirability (Messick, 1960) and possibly impor- 
tant distinctions between social and personal desirability (Rosen, 
1956). Such a control method could provide inconsistent effective- 
ness for individual assessment, depending upon each respondent’s 
conformity to the group consensus of desirability. Also, if the tend- 
encies to acquiesce and to respond desirably are important personal- 
ity variables in their own right, then an inventory based exclusively 
upon such a forced-choice format might primarily reflect response 
variance which is unrelated to these response styles. In view of the 
extremely high correlation repeatedly noted between judged de- 
sirability and item endorsement (cf. Edwards, 1953, 1957; Hanley, 
1956; Wahler, 1958), the proportion of valid personality variance 
unrelated to stylistic variables may turn out to be small and per- 
haps even trivial (cf. Jackson & Messick, 1960). 

An analytical procedure is desired which would separate response 
variance due to various content dimensions from variance due to 
response sets and would also differentiate between variance attribu- 
table to different response styles, such as acquiescence and desir- 
ability. At the same time, however, the procedure should permit 
stylistic and content dimensions to be correlated if such is the case 
substantively. 


A major purpose of the present paper is to illustrate and evaluate 


| 
| 
| 
| 


, SAMUEL MESSICK 45 


ап analytical procedure which potentially meets such requirements. 
The procedure involves the application of factor analysis with 
oblique rotations to intercorrelations among content scales and in- 
dependent measures of response styles. However, several experi- 
mental prerequisites must be fulfilled before factor analysis can be 
reasonably expected to clarify the complex domain of content and 
style interrelationships. For example, in order to reinforce the in- 
terpretation of some of the obtained factors as stylistic dimensions, 
independent marker variables for acquiescence and desirability 
response sets should be included in the analysis. Aequiescence scales 
should contain either items neutral in desirability or equal numbers 
of desirable and undesirable items to obviate the effects of a con- 
sistent tendency to respond desirably, and item content should be 
either heterogeneous or systematically counterbalanced or both. 
Similarly, desirability scales should contain equal numbers of items 
keyed true and false in the desirable direction to offset the influ- 
ence of acquiescence, and item content should also be heterogeneous 
or counterbalanced. If a single total score is included in the corre- 
lation matrix for each content measure, then these scales should at 
least contain equal numbers of true and false items. Forced-choice 
content scales could also be employed in such an analysis, pro- 
vided that appropriate double-centered factor procedures (Tucker, 
1956) are used for those forced-comparison scales which are 
ipsative in nature. 

In the present application the “true” and “false” items of each 
content scale were treated separately, so that two scores were in- 
cluded for each scale, the sum of which would generate the usual 
total content score. Several advantages accrue from this procedure, 
since (1) it permits an evaluation of the possibly differential influ- 
ence of acquiescence on “true” and “false” items; and (2) it permits 
a relatively uncquivocal interpretation of content dimensions when 
both subscales are found to load the same factor (cf. Jackson & 
Messick, 1960). 


Method 


Subjects 


The sample consisted of 145 students, 89 males and 56 females, 
from a large eastern state university; 37 subjects were freshmen, 
83 were sophomores, and 25 were upperclassmen. The age Tange 


46 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


was 17 to 34, with a mean of 19.8 and a standard deviation of 1.6 
years. 


Variables 


Ten content scales were selected from the Personality Research 
Inventory (PRI) so as to cover a reasonably broad range of con- 
tent thought to be potentially relevant to the response styles of 
acquiescence and desirability. The PRI as a whole consists of 25 
relatively homogeneous ten-item scales purporting to measure 25 
substantially independent personality characteristics (Saunders, 
1955). The ten scales selected bear the labels: Anxiety, Tolerance 
of Frustration or Ego-Control, Tolerance of Ambiguity, Compul- 
siveness, Impulsiveness, Self-Sufficiency or Independence, Social 
Conformity in Controlling Aggression, Achievement Striving, Affec- 
tive-Effective Differences, and Unconventionality. Since most of 
these scales consisted of unequal numbers of “true” and “false” 
items, several of the item wordings were reversed to provide five 
“true” items and five “false” items for each scale. These true and 
false items were then treated separately, so that two scales were 
used to represent each content area, a true scale and a false scale, 
both keyed in the content direction. Thus twenty content measures 
were ineluded in the analysis. 

Two scales were specially constructed to measure acquiescence 
and two other scales to measure the tendency to respond desirably. 
This scale construction was greatly facilitated by the use of ratings 
of desirability for the 100 items in the ten content seales, which had 
been previously obtained from the same subjects in a different con- 
text. The two acquiescence scales each consisted of 12 items, two 
from each of six content scales. One of the two items from a given 
content scale had been keyed “true” and the other "false" in the 
content direction, and since all the items on the acquiescence scales 
were keyed "true," consistent responses in terms of content were 
thus counterbalanced. In addition, the use of six different content 
areas insured a relative heterogeneity of subject matter. The items 
were also selected to counterbalance desirability, there being six 
moderately desirable and six moderately undesirable items on each 
acquiescence key. One of the acquiescence scales (Acl), however, 
did include items which were generally more neutral in desir- 
ability than those on the other form. 


| 


SAMUEL MESSICK 47 


| y The two desirability scales were constructed in a similar manner, 
` except that it was impossible to counterbalance content because of 


the previously noted confounding of desirability and content scor- 


] ing. Instead, the content of each desirability scale was made 


heterogeneous by including one item from each of the ten content, 
scales. The items were also selected so that five items on each scale 
were keyed "true" and five “false” in the desirable direction, thus 
offsetting the influence of acquiescence on the total desirability 
score. The two scales differed, however, in their general level of 
item desirability, one of them (Ds2) containing consistently more 
extreme items than the other. 

In addition, the short form of Hanley’s (1957) measure of test- 
taking defensiveness was also administered, the nine “true” and 
nine “false” items on this scale being scored separately. These scores 
were included in the analysis along with the content scales, but it 
should be noted that the Defensiveness scale is basically a stylistic 
measure closely related to the tendency to respond desirably. 


Procedure 


The 100 items from the ten content areas and the 18 items from 
Hanley’s Defensiveness scale were combined in a random sequence 
and presented as a personality inventory to 145 subjects. A four- 
point response scale of “Strongly Agree—Agree—Disagree—Strongly 
Disagree” was provided, but the responses were dichotomized for 
the present scoring purposes. 


Results 


Intercorrelations among the ten “true” content scores, the ten 
“false” content scores, the two parts of Hanley’s Defensiveness 
scale, and the acquiescence and desirability measures are presented 
in Table 1, along with reliabilities for these variables. Reliabilities 
were computed using Kuder-Richardson Formula 21, which yields 
a lower bound estimate of internal consistency that is also attenu- 
ated by the dispersion of item popularities. In spite of this bias 


? Copies of the two acquiescence scales and the two desirability scales may 
be obtained without charge from the author, or for a fee from the American 
Documentation Institute. Order Document number 6926, remitting $125 for 
35 mm. microfilm or $125 for photoprints, from Chief, Photoduplication 
Service, Library of Congress, Washington 25, D. C. 


48 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


He gag gangs 


3954242294 


gid 158$ 


99999935 


8 


gs! 


аазга 


$9$988 


= 90- 0$ te te gr 


qddge8951!99222539923 


999929994955 
SESE! 


a4 


2, 


вѕәпәлүспәјәт 
ear peuopquaavoouy 


"gupATA35 упәпәләүцәү 
*qoaquog чт Каушлоушод 
A touogo133n9-3198 


Зевәпәлүвтпашү. 


9459559830999 


3 


sansa JY 21791175 PUD JUIJUO Q биошү suoinjou109197uT 
T Таут 


reos 


Зәлтаэәјјз-әлүзәәлү * 


Sesousatstndimog 5 
ЭлутпЗташу ‘Tol ч 
mopjerjSnig "TOL * 

“Ugepay * 


памя AGG AG да 


*ATaATD2dsaz “зәтеоз „эзтъу„ үте NI}, оў ләјәх J Due 49)ip1ssqns ә, +90 
gu it 6t RE | esq" 
T (ва) Aattrqextseq * 
got 

т (эү) eousssernboy - 
SS9UsATSUsJe] °. 
#луүтепорупәлпоәшп * 
Joa1429333-9AT3293JV "i 
TgupagasS зчәшәләрцгү = 
чолупод up Азршдодшод ~ 
Jtousrot33ne-3te8 * 
Fesouaatstnday 9 
=вәцәлузтлішод * 
Feagnyaay "тор * 
Foopyexqsntg “TOL * 
эрну > 


we 


SAMUEL MESSICK 49 


toward underestimation, however, several of the scales received co- 
efficients which, in view of the extremely small number of items 
involved, could be considered substantial. An examination of the 
correlations between true and false parts of the same content scale, 
underlined in Table 1, is also revealing. Three of the scales, Com- 
pulsiveness, Conformity in Controlling Aggression, and Defensive- 
ness received negative coefficients, indicating the predominant in- 
fluence of acquiescence on these scores. Several other scales, how- 
ever, received high positive correlations, especially in relation to 
their reliabilities, suggesting substantial response consistency prob- 
ably attributable to either content, desirability, or a combination 
of the two. 

The 22 by 22 matrix of intercorrelations among the “true” con- 
tent scales, the “false” content scales, and the defensiveness meas- 
ures was factor analyzed by the method of principal axes, with 
communalities estimated by the largest correlation in each column. 
An examination of the relative sizes of the 22 latent roots led to 
the retention of eight factors for rotation to oblique simple struc- 
ture—three large ones, together accounting for about two-thirds of 
the common variance, and five smaller ones. Prior to rotation, the 
two acquiescence and two desirability scales were projected into 
the eight-dimensional space by extension methods (Dwyer, 1937; 
Mosier, 1938). This particular mode of analysis was adopted to 
see whether sufficient acquiescence and desirability variance was 
Present within the content scales themselves to generate stylistic 
dimensions, which could subsequently be interpreted in terms of 
the marker variables extended into the factor space. Thus, the ex- 
traction of separate factors identifiable as acquiescence and desir- 
ability in the present study is not due to interrelationships among 
the specially constructed set scores, but results instead from con- 
sistent stylistic responses to the content scales themselves. 

Apart from a correlation of .24 between the first two dimensions, 
the factor space is essentially orthogonal, as seen from the correla- 
tions among the primary factors given in Table 3. The rotated 
factor loadings are presented in Table 2, and it should be noted 
that the communalities listed there for the four measures of re- 
Sponse style are quite substantial, ranging from .40 to .68. In view 
of the large proportion of unreliable variance inevitably attendant 
"pon the use of such short scales, the communalities obtained for 


50 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the true and false content measures are also surprisingly large, only 
three of them falling below .37. 

The first factor listed in Table 2 appears to represent a second- 
order complex of unconventional vs. conventional characteristics 
generated by correlations between content scales rather than by 
within-scale variance. High negative loadings were received on this 
factor for both true and false Affective-Effective scales, both Un- 
conventionality scales, both Impulsiveness scales, both Self-Suffi- 
ciency scales, and both Tolerance of Ambiguity scales, while high 
positive loadings were obtained for Achievement Striving (true), 
Conformity in Controlling Aggression (false) , Defensiveness (false), 
and Desirability (Ds2). The factor thus appears to emphasize a 


TABLE 2 
Rotated Factor Loadings 
Factor 
Seale ЕТУ WV VE VE үш 5 
1. Anxiety, — 02 —40 20 п —16 50 —08 04 50 
2. Tol. Frustration; 28 43 14 —18 —05 —08 —03 —05 33 
3. Tol. Ambiguity, —-31 44 17 —28 —07 09 01 -17 44 
4. Compulsiveness, 201 —29 37 08 —090 12 -26 23 37 
5. Impulsiveness, ad) SO ЧТ 38 07!—08 17 17 4 
6. Self-Sufficiency, —89 -03 16 11 50 —02 —12 -—06 46 
7. Conformity in 
Control, 03 28 36 02 е 28 
8. Achievement a АДЫ M 
Striving, 43 21 233 15 09 13 -09 16 42 


9. Affective fective: —43 28 26 12 12 04 12 -13 39 
10. Unconventionality, —47 —07 41 —14 —04 -21 08 09 47 


11. Defensiveness, Sor SiS ot) prio -19 16 87 
12. Anxiety, — -03 -37 -18 —12 11 87 07 -03 5 
18. Tol. Frustrationy 15 31 -21 -33 —07 -08 -06 07 29 
14. Tol. Ambiguity; —50 32 -31 —15 06 —04 -07 —04 48 
15. Compulsiveness, 15 —05 —40 45 —03 02 -01 -05 39 
1 EA —50 09 -36 —14 02 —03 05 26 4$ 

d -Sufficiency; —85 (05 - -—- 3 
18. Conformity in O 

Controly 40 эӊ ae = К 3 
19. Achievement у ce aT linia 1 


Striving, 07.47 - - 7 
20. Affective-Effective, — —58 er EE A oe ^ ео » 4 
21. Unconventionalityy —48 17 —08 10 04 —13 -25 —24 41 
22. Defensiveneasy 34 14 -32 —91 08 04 28 02 37 
23. Acquiescence (Ac) 1. —11 04 53 07 —03 -04 30 07 40 

с 


25. Desirability (Ds)1 27 52 -06 10 —26 -01 15 -20 49 


| 


SAMUEL MESSICK 51 


TABLE 3 
Intercorrelations Among Primary Factors 


Factor T II III IV V VI VII ҮШ 
I — 24 —07 02 01 00 00 01 
II 24 —  -08 04 07  —02 00 01 
ш —07 —03 — 05 —0%6 04 00 03 
IV 02 04 05 — п -1H 12 17 
M 01 07 —06 1i — -17 -10  -07 
VI 00 —02 04 —11 —17 = 02 01 
VII 00 00 00 12. 2-10) 02 — 00 
VIII 01 01 03 17 | —07 01 00 — 


kind of affective impulsivity at one extreme vs. a controlled con- 
ventionality at the other. This dimension is also related to desir- 
ability, as indicated by the .63 loading for Ds2, suggesting that a 
generalized tendency to respond desirably would tend to decrease 
Scores on affective, unconventional, and impulsiveness scales. The 
correlation between loadings on this factor and mean judged de- 
sirability values for the scales (derived by averaging the item 
desirability ratings previously obtained from these respondents) 
Was .43, 

High positive loadings were obtained on the second factor for 
both Desirability scales, both true and false Tolerance of Ambiguity 
and Tolerance of Frustration scales, Achievement Striving (false), 
and Defensiveness (true), while high negative loadings were ob- 
tained for the two Anxiety scales and Compulsiveness (true). Since, 
in addition to the high saturations for the two Desirability scores, 
the loadings on this second dimension were also found to correlate 
78 with the average desirability ratings for the content scales, this 
factor was interpreted as the tendency to respond desirably. How- 
ever, with anxiety and compulsiveness at one extreme and the 
tolerances of ambiguity and frustration at the other, an alternative 
Interpretation in terms of scale content would probably emphasize 
“ome aspect of maladjustment vs. adjustment. With this latter in- 
terpretation, a desirability factor would be considered to lie some- 
Where between the first two dimensions, since the multiple correla- 
tion between loadings on these first two factors in predicting the 
Average desirability values of the scales was found to be .89. 

The third factor is clearly identifiable as acquiescence, since the 


52 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


two acquiescence scales received loadings of .53 and .62 on this 
dimension, and all of the “true” content scales received positive 
loadings and all of the “false” content scales negative loadings. As 
might be expected, the order of the loadings on this acquiescence 
factor is predictable from the size of the correlations between true 
and false parts of each content scale, as evidenced by a rank corre- 
lation of —.92 between the true-false coefficients and the average 
of the absolute loadings of corresponding true and false subscales 
(the negative loadings being appropriately reflected). Thus, those 
scales with negative or small positive correlations between their 
true and false halves tended to receive high loadings on the third 
factor—high positive loadings for the true items and high negative 
loadings for the false component. Although the obtained pattern 
of test vectors on this third factor is easily interpreted, the plots 
of these loadings against other factors, particularly factor II, do 
not always meet criteria of simple structure. In fact, the plot of 
factor III against factor II is circular in appearance (cf. Jackson 
& Messick, 1960), suggesting that both extremely desirable and 
extremely undesirable scales have very low acquiescent compo- 
nents, whereas scales neutral in desirability have very marked 
acquiescence effects. 

Factor IV appears to be largely specific to Compulsiveness (false); 
although there is some tendency for the negative pole to be defined 
by the tolerances of ambiguity and frustration, suggesting an in- 
strumental role for compulsivity in controlling or adjusting to in- 
tolerance of ambiguity and frustration. 

Factor V is interpreted as a content dimension of Self-Sufficiency, 
since the true and false parts of that scale received loadings of .50 
and .59, respectively. Similarly, factor VI is interpreted as a con- 
tent dimension of Anxiety, with loadings of .50 and .57, respectively; 
for true and false parts of the Anxiety scale. 

Factor VII, although small, is of considerable interest, for even 
though it is uncorrelated with factor III, which has been previously 
interpreted in terms of acquiescent response style, its highest load- 
ings were obtained by the two acquiescence scales, It may be that 
this factor represents a substantive aspect of acquiescence inter- 
pretable in terms of content relationships, but if so, interpretation 
is difficult, because very few content scales received loadings OV€T 
.20. A moderate positive loading, for example, was also obtained bY 


SAMUEL MESSICK 53 


Defensiveness (false) and moderate negative loadings by Uncon- 
ventionality (false), Controlled Aggression (false), and Compul- 
siveness (true) (cf. Couch & Keniston, 1960). 

Factor VIII, having only one stray loading as large as 30, was 
not interpreted. 


Discussion 


Of the three largest factors obtained in the present study, to- 
gether accounting for about two-thirds of the common variance, 
two of them (factors II and III) were attributable to the response 
styles of desirability and acquiescence. The other large dimension 
(factor I), which appeared to represent a second-order content 
complex of controlled conventionality vs. impulsive unconyentional- 
ity, was also somewhat correlated with the desirability factor and 
loaded heavily by a desirability scale. Thus a considerable portion 
of variance in response to these personality items was associated 
with stylistic response tendencies. 

The appearance of factor VII, slight as it is, also opens a possi- 
bility of potential consequence for current procedures of scale con- 
struction and inventory interpretation—namely, the possibility that 
two aspects of response style can be empirically differentiated, one 
related to spurious stylistic effects upon inadequately controlled 
scales and the other to substantive content-style interrelationships. 
The appearance of factors specific to particular content seales, such 
as factors V and VI, is also noteworthy, since such factors are of 
great potential value in constructing uni-dimensional personality 
scales. Item analyses, for example, could be performed using factor 
Scores on such specific content dimensions as criteria, rather than 
the usual total scale scores with their inevitable response set con- 
taminations. The properties of items particularly facile in eliciting 
response styles could also be investigated through similar item 
analyses against stylistic factor scores. 

The relative contributions of acquiescence and desirability to 
each of the “true” and “false” content scales in the present study 
can be readily computed from the factor loadings of Table 2, but 
it should be noted that a more complicated procedure involving à 
correction for scale variances would be required to estimate the 
influence of these response styles on total scales formed by adding 
corresponding "true" and "false" components. When "true" and 


54 = EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


"false" items are added together to produce balanced content scales, 
some of the acquiescence variance cancels out, and the desirability 
and content variance assumes a correspondingly larger proportion 
of the total scale variance. However, one would expect acquiescence 
still to be substantially represented in a similar analysis of com- 
bined scores for content scales such as the present ones, particu- 
larly for those measures where the correlation between “true” and 
“false” parts was negative or considerably smaller than that ex- 
pected from the component reliabilities. 

The application of factor analysis to intercorrelations among per- 
sonality inventory scales has thus resulted in a separation of 
stylistic and content variance, but has not in principle restricted 
that separation to orthogonal components, as illustrated by the con- 
ceptual and empirical relationship between the “conventional” re- 
sponses of factor I and the “desirable” responses of factor II. Such 
an analysis thus appears to permit a joint consideration of stylistic 
and content variables, as well as their interrelationships, within а 
single multi-dimensional framework. 


Summary 


Е In order to evaluate the respeetive contributions to personality 
inventory scales of consistent responses to item content on one hand 
and of the response styles of acquiescence and desirability on the 
other, factor analysis was applied to intercorrelations among con- 
tent scales and stylistic measures as a potential means of sepa- 
rating distinct, but possibly correlated, dimensions of content and 
style. “True” and “false” items were scored separately for ten rela- 
tively homogeneous content scales and a measure of defensiveness. 
A factor analysis of intercorrelations among these 22 scores yielded 
eight factors—three large ones, together accounting for about two- 
thirds of the common Variance, and five small ones, Two measures 
each of the tendencies to acquiesce and to respond desirably were 
subsequently projected into the factor space by extension, in order 
to identify stylistic dimensions. Of the three large factors, two were 
attributable to the Tesponse styles of desirability and acquiescence, 
and the third, which appeared to represent a second-order content 
dimension of impulsive unconventionality vs. controlled conven- 
tionality, was also related to desirability variables. Some of the 
smaller factors were attributed to specific content dimensions, since . 


SAMUEL MESSICK 55 


J 
"true" and "false" parts of single content scales dominated the 
loadings. 


REFERENCES 


Berg, I. A. “Response Bias and Personality: The Deviation Hy- 
pothesis,” Journal of Psychology, XL (1955), 61-72. 
Chapman, L. J. and Bock, R. D. “Components of Variance Due to 
Acquiescence and Content in the F Scale Measure of Authori- 
tarianism.” Psychological Bulletin, LV (1958), 328-333. 
Corah, N. L., Feldman, M. J., Cohen, I. S., Gruen, W., Meadow, A., 
and Ringwall, E. A. “Social Desirability as a Variable in the 
Edwards Personal Preference Schedule.” Journal of Consulting 
Psychology, XXII (1958), 70-72. 4 
Couch, A. and Keniston, K. *Yeasayers and Naysayers: Agreeing 
Response Set as a Personality Variable.” Journal of Abnorma 
and Social Psychology, LX (1960), 151-174. 
Cronbach, L. J. “Response Sets and Test Validity." EDUCATIONAL 
AND PSYCHOLOGICAL MEASUREMENT, VI (1946), 475—494. 
Cronbach, L. J. “Further Evidence on Response Sets and Test De- 
i sign." EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, X 
(1950), 3-31. H 
Dwyer, P. S. “The Determination of the Factor Loadings of a 
| Given Test from the Known Factor Loadings of Other Tests. 
Psychometrika, ТЇ (1937), 173-178. ў Й 
Edwards, А. L. The Social Desirability Variable in Personality 
Assessment and. Research. New York: Dryden Press, 1957. 
| Edwards, А. L., Wright, C. E., and Lunneborg, C. E. “A Note on 
Л ‘Social Desirability as a Variable in the Edwards Personal 
Preference Schedule." Journal of Consulting Psychology, 
XXIII (1959), 598. 
| Frederiksen, N. and Messick, S. “Response Set as a Measure of 
Personality." EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 
XIX (1959), 137-157. " 
Fricke, B. G. “Response Set as a Suppressor Variable in the OAIS 
and MMPI.” Journal of Consulting Psychology, XX. (1956), 


| 
ii 
161-169. 
Hand, J. and Reynolds, H. H. “Suppressing Distortion in Tempera- 


Кошу днп, Journal of Consulting Psychology, XXV 

, 181. 

Hanley, C. “Social Desirability and Responses to Items from Three 
MMPI Seales: D, Se, and K.” Journal of Applied Psychology, 
XL (1956), 324-398. : 3 

anley, C. “Deriving a Measure of Test-Taking Defensiveness. 

a Journal of Consulting Psychology, XXI (1957), 391-397. 

elmstadter, G. C. “Procedures for Obtaining Separate Set and 
Content Components of a Test Score.” Psychometrika, 
(1957), 381-393. 3 

Jackson, D, N. and Messick, S. “Content and Style in Personality 

Assessment," Psychological Bulletin, LV (1958), 243-252. 


56 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Jackson, D. N. and Messick, 8. “Acquiescence and Desirability 
Response Determinants on the MMPI.” EDUCATIONAL AND Psy- 
CHOLOGICAL Measurement, XXI (1961), 771-790. 

Messick, S. *Dimensions of Social Desirability.” Journal of Con- 
sulting Psychology, XXIV (1960), 279-287. і 

Messick, S. and Jackson, D. N. "The Measurement of Authoritar- 

ian Attitudes." EDUCATIONAL AND PSYCHOLOGICAL MEASURI 

MENT, XVIII (1958), 241-253. 

Messick, S. and Jackson, D. N. “Acquiescence and the Factorial In- 
terpretation of the MMPI.” Psychological Bulletin, ІМ 
(1961), 299-304. 

Mosier, C. I. “A Note on Dwyer: The Determination of the Facto! 
Loadings of a Given Test.” Psychometrika, ПІ (1938), 297-299. 

Rosen, E. “Self-Appraisal, Personal Desirability, and Perceived So 
cial Desirability of Personality Traits.” Journal of Abnormal 
and Social Psychology, LII (1956), 151-158. 

Saunders, D. R. “Some Preliminary Interpretive Material for the 
PRI.” Research Memorandum 55-15, Princeton, N. J.: Educa- 
tional Testing Service, 1955. ( 

Tucker, L. В. *Factor Analysis of Double Centered Score Matrices.” 
Researeh Memorandum 56-3. Princeton, N. J.: Educational 
Testing Service, 1956. 

Wahler, H. J. “Social Desirability and Self-Ratings of Intakes, 
Patients in Treatment, and Controls.” Journal of Consulting 
Psychology, XXII (1958), 357-363. 

ebster, H. “Correcting Personality Scales for Response Sets or 
Suppression Effects.” Psychological Bulletin, LV (1958), 62-64. 

Webster, H. “A Note on ‘Correcting Personality Scales for Response 

(1o? ar omion Effects." Psychological Bulletin, LVI 


Wiggins, J. S. and Rumrill, C. “Social Desirability in the 


and Welsh's Factor Scales A and R.” Journal of Consulting. 
Psychology, XXIII (1959), 100-106 А 


EDUCATIONAL AND PSYCHOLOGICAL. MEASUREMENT 
Vou. XXII, No. 1, 1962 


BINOMIAL SEQUENTIAL ANALYSIS 
—GOOD ETHICS, LITTLE EFFORT 


A. E. MAXWELL 


Institute of Psychiatry 
University of London 


CLINICAL trials and other psychological and medical experiments 
are often conducted in a sequence in time, as subjects eligible for 
inclusion in the experiment become available. In such experiments 
there may be good ethical reasons why the experimenter should ex- 
amine the results continuously as they become available so that 
delay of effective treatment, if say the subjects are hospital pa- 
tients, can be avoided. When this examination consists in perform- 
ing significance tests of the familiar kind to compare two (or more) 
treatments periodically as the sample size increases, the procedure 
has the effect of exaggerating the real significance of the results 
(Armitage, 1954; Feller, 1940) so that the null hypothesis may be 
rejected too hastily or wrongly. To avoid this danger and to enable 
reliable decisions about the respective merits of the treatments to 
be reached as the experiment continues, special methods of sequen- 
tial analysis should be used. Such methods were introduced over a 
decade ago by Wald (1947) and Barnard (1946) primarily for use 
in industrial inspection work, their special attraction being that a 
Pronounced difference between two methods of production, or two 
treatments, could be detected in general more quickly by employing 
them than by the comparison of mean values based on samples of 
fixed size, 

More recently the methods of sequential analysis have been 
adapted for use in clinical trials, notably by the work of Bross 
(1952) and Armitage (1954). But in addition to applying the origi- 


57 


58 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


nal sequential techniques in medical investigations, these writers 
(Armitage, 1957, 1958) have shown how they can be modified to 
meet situations in which the number of subjects likely to be avail- 
able in a limited period of time for a partieular investigation can 
be taken into account—and they thereby have allayed, as shall be 
seen, certain earlier doubts about the practical value of sequential 
tests. In this article the general procedure when such tests are em- 
ployed will be outlined, but special attention will be given to the 
recent developments in the field (Armitage, 1954, 1957, 1958; Bross, 
1952). Though many of their investigations are essentially of a 
Sequential nature, psychologists—clinical ones in particular—have 
greatly neglected sequential analysis. Perhaps they will be per- 
suaded to give it a trial when they learn that the most recent 
sequential procedures in binomial situations, with which this article 
deals, require no calculations whatever to be performed: the results, 
as they become available, are simply plotted on a chart, 


Sequential Analysis—Basic Notions 


The general ideas involved when sequential tests are used are 
much the same ag those involved when the familiar tests of sig- 


nificance are employed. The experimenter is interested in one or 
the other of three possibilities: 


(a) of accepting the nul] hypothesis (Ho) that the different meth- 
ods of treatment, or attributes being compared, do not differ; 
(b) of accepting the alternative hypothesis (Ну) that a real dif- 


null hypothesis is accepted when in fact Some alternative hypothe- 
sis (Hi) is true. The Probability of errors of the first kind will be 
referred to by the letter a, while the probability of those of the 
second kind will be referred to by B. 

Sequential tests are easiest to understand when presented in 
graphic form. Since attention is being confined to binomial sequen- 
tial procedures, we will be concerned here only with events of the 


| 


| 


| 


ea 


COMMA eee 


А. E. MAXWELL MR 59 


"either-or" category—that is, with quantal or dichotomous data, 
or with data which сап be considered as if they were dichotomous. 
The results from a hypothetical experiment might then appear 
serially as follows: 


101110103 010981 


where the symbol “1” indicates that a certain event has happened 
and the symbol “0” indicates that it has not happened. The test 
consists in plotting on a chart some function of the number of oc- 
currences and non-occurrences of the event against the number of 
trials, 

Before commencing the experiment, boundary lines are drawn on 
a chart. The equations of these lines are based on mathematical 
considerations which involve the probabilities æ and 8, which give 
the risks of ineurring errors of the first, and second kinds, respec- 
tively, together with certain probabilities associated with the hy- 
pothesis being tested. The procedure will be illustrated by con- 
sidering two examples. In the first instance the example taken is a 
hypothetical one, but data from a real experiment are reported later. 


Hypothetical Example 


It is generally agreed that the proportion of patients that dies 
during their first heart attack is about 40 per cent. Suppose that a 
new drug, which we will call Coronolysin, is found which—it is 
claimed—will reduce the death rate to 25 per cent if injected dur- 
ing the attack, It is proposed to test the claim by experiment. In 
this example the standard proportion, ро, of deaths is .40, while the 
Proportion, р,, under the new treatment is claimed to be .25. To set 
up the boundary lines (Figure 1) we require—in addition to the 
Values of p, and Pi—to decide in advance the values of a and В 
which are to be used. If the risk of rejecting the null hypothesis 
(Ho) when it is true is set at the 5 per cent level, then a is equal to 
05, while if the risk of accepting the null hypothesis when in fact 
the alternative hypothesis (Н) is true is set at the 2 per cent level 
then £ is 02. These values of a and B should be chosen with the 
relative seriousness of committing errors of one or the other kind 
In mind, Naturally they may differ from one experiment to another 
and, if thought desirable, may be taken equal. In the present ex- 
Periment the claim put forward for the new drug would lead us to 


60 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


consider it carefully, and so it is reasonable to set B at a lower level 
than o so that the drug has more than an equal chance of demon- 
strating its effectiveness. 

The equations of the boundary lines used in Sequential analysis 
can be shown (Davies, 1954) to be: 


Yo = ho + ns а) 
у = hy An ns (2) 
respectively, where 
hy = — log. [1 — a)/8] @) 
hı = Û log, (1 — 8)/a] w 
and 
s = 5108, 10 — p)/0 — pj] ® 
where 


€ = log, [pi = Pop3)/ (po = Popr)]. 
Since in this example ро = .40, p, = 25, a = .05 and 8 = .02, we 
find on substitution in the above equations that c = —.69, ho = 
5.57, hı = —429 and s = 39. Entering these values in equations 


(1) and (2), the equations of the boundary lines LM and PQ (Fig- 
ure 1) are found to be 


Yo = 5.57 + .32n 
and 


и —4.9 + 32n 
where n is the ordinal number of the observations. 


of recoveries, indicated b 


the results, as they become available from the experiment, are: 


100000100000000010000010000. 


A. E. MAXWELL 61 


Figure 1. Chart showing boundary lines for comparing an observed proportion 
with a population value. The zigzag line is a plot of the results given on 
Page 60. 


The results are now plotted on the chart (Figure 1). The numbers 
on the horizontal scale represent the ordinal numbers of the pa- 
tients and for each point on this scale the ordinate is equal to the 
number of deaths which have occurred up to date. In other words, 
for each n-value in the figure the y-value is the sum of the entries 
in the series up to and including the n-value. The points on the 
chart can now be joined by a line. When this line, which starts at 
the origin, crosses either of the boundaries sampling is stopped. If it 
Crosses the LM boundary the null hypothesis is accepted and it is 
Concluded that the drug is not having the promised effect. If it 
crosses the PQ boundary, the alternative hypothesis (H1), that the 
drug has justified the claim made on its behalf, is accepted. While 
the line remains between the two boundaries, sampling continues. 
For the hypothetical data under consideration the PQ boundary 
18 crossed when 27 patients have been treated, and so the hypothe- 
sis Hy is accepted. 

Practice it is not always the best procedure to plot the results, 
38 has been done in Figure 1, for the scales on the chart can be 


62 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


troublesome when the slope of the boundary lines is steep. In such 
cases it is preferable to calculate for each value of n the value of 
Yo and yi given by equations (1) and (2), and to stop sampling 
when the sum of the 1’s up to and including a given 7 fall outside 
the values of yo and yı for that n. 


Sample Size 


One objection that has been lodged against sequential tests is 
that the zigzag line, starting at the origin, might never cross either 
boundary so that in effect the experiment might go on forever. As 
this is a possibility against which most experimenters wish to guard, 
sequential methods of analysis are not as popular as they deserve 
to be. But the danger has been overemphasized: as has been said, 
the chances that the experiment might go on forever are about the 
same as the chances that a game of tennis might go on forever since 
the rules for stopping in both cases are qualitatively similar. How- 
ever, to alleviate fear there are two considerations to be borne in 
mind. The first is that the average sample sizes required for a de- 
cision if 

(a) Н, is true, 

(b) H, is true, 

(c) the zigzag line follows a path half way between and roughly 


parallel to the boundaries, 
are respectively (Davies, 1954, p. 85): 


(a) n. = (1 2 ho + ah, (6) 
y (po — 8) 
_ В + (1 — ph 
@ = PEE Dh @) 
E Shohi 
(c) fi, "wi s(1 =a). (8) 


In our hypothetical example these values, rounded up to the nearest 
whole number, are 64, 59 and 110, respectively. If a sample of the 
relative size indicated by these results is not considered feasible in 
view of the rate at which subjects turn up, or in consideration of the 
ethical problems which might arise, the value of a, 8, and pı might 
require adjustment in the hope of reaching a decision with a smaller 
sample. Here Wald’s (1947) discussion of truncated sequential tests 


À. E. MAXWELL 63 


e 

| variations and elaborations of Wald's ideas (Bross, 1952; 
ge, 1957) are relevant; they constitute the second of the two 
considerations referred to above and they will be dealt with 
у. But, in passing, we should note that apart from the 
od of analysis just described for comparing a new treatment 
а standard one, the theory of sequential tests allows two new 
ments to be compared with each other (Wald, 1947; Davies, 
. The earlier approach will not, however, be described since a 
d version of it, due to Armitage (1957, Snell & Armitage, 
| has certain attractions, This version meets at one and the 
time the desire to examine experimental results continuously 
become available, and the natural desire of the investigator 
an his experiment in the light of the number of subjects likely 
available in the time at his disposal. The new method is re- 
to as a “restricted sequential procedure.” 


Restricted Sequential Procedures 


ly the binomial test for comparing two proportions is consid- 
here. It is applicable in two types of situation: 


rhen each of two treatments A and B is given to each subject, 
й random order, and either the subjects or the experimenter 
akes a judgment as to which treatment is the more suecessful 
‘Oris preferred; or 

the subjects are paired as they enter the experiment, one chosen 
domly being allotted to treatment A and the other to treat- 
ent B, and again a judgment regarding the values of the 
atments is made. 


ough the statistical comparisons concern only two treatments 
time, the experiment, as will be seen in the next example, can 
ined to take more than two treatments into account. 

p of the situations i and ii the trials should be conducted 
—neither the subject nor the experimenter who is to make 
dgment about the treatments should know in any particular 
which treatment is involved, provided he is informed when 
of treatment occurs. Under these conditions the ethical re- 
Dents of continuously examining the results of the trials as 
come available can be carried out by a third person not in- 
im the experiment as such. 


64 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


When the experiment is conducted according to û or ii above, the 
results are obtained as a series of preferences of the form AB or BA 
depending on whether A or B respectively was the treatment pre- 
ferred. Ties are disearded as they provide no information regarding 
the relative merits of the treatments. In a situation in which A and 
B have the same effect, the number of AB preferences would tend to 
equal the number of BA preferences. With restricted sequential 
procedures the abcissa (horizontal axis in the chart) shows the num- 
ber of preferences, while the ordinate (vertical axis) shows the 
difference between the number of A-preferences and the number of 
B-preferences (see Figure 2). For each A-preference a step on the 


Figure 2. Chart for restricted sequential ing the oute 
5 procedures showing Aus 
boundaries LM and L'M' and the inner boundary КОК”. The zigzag line #8 ® 
plot of the results given on page 67. 


A. E. MAXWELL - 65 
is taken in a N-E direction, while for each B-preference a 


to be equal, the zigzag line will move across the chart in 
hly a horizontal direction, otherwise it will move upward in 
ivor of A or downward in favor of B. f 


The Boundaries 


T To fix the boundaries in the case of a restricted sequential test, 
t is hecessary for the experimenter to decide on the sample size— 
umber of subjects, or pairs of subjects—in advance. When this is 
done he refers to a table—provided by Armitage (1957), an extract 
Tom which appears in Table 1 below—in which the intercepts, a, 
and the gradients, b, of the boundary lines he requires are given. 


Fr om these the two equations 


Yo = a + bn (9) 
1 y, = —a — bn (10) 
: or the upper (U) and lower (L) boundary lines, respectively, are 
€ up. When the results become available, a zigzag line is plotted 
On the chart. If it crosses the upper boundary sampling is stopped 
а decision made in favor of treatment A, while if it crosses the 
T boundary a decision is made in favor of treatment B. This 
dure is the same as that already explained, but with restricted 
Sequential tests an interesting new feature enters. Since the seg- 
ments of the zigzag line are at angles of 45° to the x-axis, it is 
Own before the full quota of N¥patients, or N pairs of patients, 
been tested that a difference between the treatments will not be 
ablished once the zigzag line crosses the wedge-shaped or inner 
лгу in Figure 2. These boundaries are drawn by finding the 
? Points K and K’ on the vertical line through N which are unit 
ance from LM and I/M' respectively. From these points, lines 
p of 45° to the x-axis are drawn back to that axis. 


Experimental Results 


the Procedures just outlined will now be illustrated by some ex- 
nental results. A drug trial was performed to test the relative 
encies of the two tranquilizers, Sodium Amytal (S) and Per- 
Anine (P). For control purposes, a placebo (C) was also in- 
d in the trial. The drugs and placebo were administered in the 


д 


ep is taken in а S-E direction. If the effects of the two treatments + 


66 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


form of tablets of exactly similar appearance. The experiment was 
conducted as a blind trial, neither the patients nor the registrar, 
who was later to report on their progress, knowing on any occasion 
which drug was being administered; the latter was known only to 
the psychiatrist in charge. The patients were kept on each drug for 
two days, the patients and the registrar being informed when the 
treatment was changed. The patients were asked to keep the effects 
of each drug in mind so that at the end of the experiment they 
could say which of each pair of treatments they thought was the 
more beneficial. The registrar too knew that at the end of the trial 
he would be required to rate the effects of the treatments on the pa- 
tients. The three treatments were administered in random order to 
successive groups of six patients to cover the six possible orders in 
which three treatments can be arranged, namely: 


Order 
1 2 3 4 5 6 
S B С © 8 Р 
E с S P C 8 
С S P S P C 


Previous to commencing the trial, it was decided that the avail- 
able facilities would allow no more than 80 patients to be treated. 
As this number is a multiple of 6, the number of possible differen! 
arrangements of the treatments, and since it is one of the sizes of 
sample for which details of the boundary lines (Table 1) are pro- ў 
vided, an N of 30 was agreed upon. The constants a and b for the 


TABLE 1 
Intercept (а) and Slope (b) for the Boundary Lines for Different Sample Sizes 
(N) for Restricted Sequential Procedures with a = .025 and В = .05 
(Armitage, 1957; Table 5 abridged) 


Proportion of Maximum Number 
Untied Pairs Intercept Slope of Untied Pairs 
1 a b N 
.65 11.75 .1524 192 
.70 8.59 .2058 105 
.75 6.62 .2619 65 
.80 5.25 .3219 48 
.85 4.19 .8882 30 
.90 3.31 .4650 21 
.95 2.47 .5640 14 


A Nhs Ss Ay S ае 


A. E. MAXWELL 67 


"equations of the boundary lines were then substituted in equations 
0) and (10) to obtain 


Yo = 4.19 + 0.39n 


1 


Yı = —4.19 — 0.39n. 

These lines are plotted in Figure 2. 

| The patients were administered the drugs and they later rated 
them by paired comparisons, For sodium amytal (S) and perphena- 
Zine (P) the results were as follows —the preferred treatment being 


PS SP PS SP SP PS PS PS SP PS PS pg gp gp 

PS PS PS PS PS SP SP SP SP SP РЗ pg 
These values are plott 
Move in a N-E directi 
orded, and one in a S- 
ine crossed the middle 
S far as the compariso; 
18 concerned, the expe 
tepted. Had the zigza 


aim to significance at a probability level smaller than the 5 per 
nt level used when 


the paired comparisons between each of the drugs and the 
Placebo could also h 
gure 2, but the de 
Enough has now 


2e available in a 
nificance of the 
mitage’s restrieted 


; Interpretation of Results 


“ie data in Table 1 have been worked out for a value of а equal 
0 025 (corresponding to a two-tail test at the 5 per cent level) and 


68 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


a value of 8 equal to .05. The boundary lines are defined in such a 
way that if two treatments are in fact equivalent—so that in the 
long run half the patients will favor one and half the other—there 
is a probability of about .95 that the zigzag line will end on the 
middle boundary, and a probability of about .05 that it will reach 
one of the outer boundaries first, 

In the restricted sequential procedure, too, success rates p, and 
Pa for the two treatments are not explieitly stated, but they are 
incorporated in the proportion 0, given in Table 1. This is the ex- 
pected proportion of untied pairs and is given by the formula 


re. „(1 = Рр. T 
х pl — Pa) + pl — р) 


On the null hypothesis that Pi = р», бу = Y. If р» is greater than 
Pı, the second treatment is better than the first, and conversely. For 
а set of 30 preferences (N = 30, Table 1) 6, is seen to be .85. This 
indicates that, if either treatment is so much better than the other 
that in the long run 85 per cent of patients will show a preference 
for it, the zigzag line will cross the outer boundary on the average 
95 times in every 100 with samples of 30 preferences. 

One final comment: as 
dure when conducting sequential tests is to pair the subjects 88 
they enter the experiment and then to allot the members of each 
pair randomly to two treatments being compared. Pairing in this 
way does not imply matching, but often in sequential testing proce- 


dures it may be desirable to use matehed pairs of subjects. Other- 
wise differences of age, sex, 


might render the results of the experiment ambiguous, go uncon- 
trolled. When matched pairs 


are used, the methods of sequential 

analysis have to be modified to take account of possible correlation 

between the members of a pair. To deal with this situation, Billewic? 

(1956) has supplied two models which may be employed, and the 
“student who is interested in 


Sequential tests for proportions using 
matched pairs of subjects should consult his paper. 


Summary 


It is pointed out that clinical trials and other psychological and 
medical experiments are often conducted in a sequential manner 4° 
subjects suitable for the experiment become available, When this 


mentioned previously, a common proce- 


and a score of other factors, which | 


— e 


A. E. MAXWELL 69 


is so, both ethical and statistical problems arise—the former be- 
cause it is undesirable to continue using an inferior treatment, say 
with hospital patients, if an alternative treatment clearly is more 
effective; the latter because a statistical examination of results as 
they become available requires the use of special sequential tech- 
niques. In this article, binomial sequential techniques are reviewed 
and an attractive new method is described which requires no caleu- 
lations to be performed and which allows the investigator to fix his 
Sample size in advance. 


REFERENCES 


Armitage, P. “Sequential Tests in Prophylactic and Therapeutic 
Trials.” Quarterly Journal of Medicine, XXIII (1954), 235-274. 
rmitage, P. “Restricted Sequential Procedures.” Biometrika, XLIV 
(1957), 9-26. : ] 
rmitage, Р. “Sequential Methods in Clinical Trials." American 
Journal of Public Health, XLVIII (1958), 1395-1402. 

Barnard, С. А. "Sequential Tests in Industrial Statistics.” Journal 
, Royal Statistical Society (Supplement), VIII (1946), 1-21. } 
Billeviez, W. Z. “Matched Pairs in Sequential Trials for the Sig- 

nificance of a Difference Between Proportions.” Biometrics, XII 
(1956), 283-300. 
et "Sequential Medical Plans." Biometrics, VIII (1952) 188- 


Davies, 0. L. The Design and Analysis of Industrial Experiments. 

E Edinburgh: Oliver and Boyd, 1954. 

eller, W, “Statistical Aspects of ESP.” Journal of Parapsychology, 

Snowy (1940), 271-298. ; 

nell, E. 8. and Armitage, P. “Clinical Comparison of Diamorphine 
[ро leodine as Cough Suppressants.” The Lancet, i (1957), 


B x Sequential Analysis. New York: John Wiley and Sons, 


Vot. XXII, No. 1, 1962 


| EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


SOCIAL DESIRABILITY AND EXPECTED MEANS 
ON MMPI SCALES: 


ALLEN L. EDWARDS 
University of Washington 


Subjects when they are asked to describe their own personalities in 
terms of the statement has been found by Edwards (1953a) to be an 
increasing linear function of the social desirability scale value of 
the statement (r — .87). "Thus, knowing the social desirability scale 
value of a personality statement, it is possible to predict, fairly 
accurately, the theoretical relative frequency with which it will be 
answered True by a given group of subjects. 

Further consideration of the relationship between the probability 
of item endorsement and social desirability scale value has resulted 
In the concept of a “socially desirable response.” A socially desirable 

| Tésponse is defined as a True response to a statement with a socially 


: Tum probability that a personality statement will be endorsed by 


Е desirable Seale value or a False response to a statement with a 
-Socially undesirable scale value (Edwards, 1957). Originally, a 79- 
ps Seale designed to measure the tendency of subjects to give 
Socially desirable responses to personality statements was developed 
and was called a Social Desirability (SD) seale (Edwards, 1953b). 
The 79-item SD scale was later subjected to item analysis and re- 
duced to 39 items which best differentiated between a high and low 
Scoring group (Edwards, 1957). Scores on both the 79 and 39-item 
Scales have been found to be substantially correlated with 
192708 Оп other per sonality scales of the True-False type (Edwards, 
1958, 1957; Fordyce, 1956; Edwards, Heathers & Fordyce, 1960). 
[a 
Road duly сооз В A o у grant from the Agnes Н. Anderson 


71 


72 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


It is the purpose of the present study to show how it is possible to 
prediet the mean or expected score on a given personality scale in 
terms of an estimate of the average probability of a socially desira- 
ble response and the proportion of the items in the sesle which are 
keyed for socially desirable responses. 


Method 


Social desirability seale values for a set of 140 personality state- 
ments were available from an earlier study along with the proba- 
bilities of endorsement for each statement (Edwards, 1953a). A 
socially desirable response has been defined as a True response to 
an item with a socially desirable scale value or a False response to 
an item with a socially undesirable scale value. To determine the 
expectancy of a socially desirable response over the social desirabil- 
ity continuum, the average probability of a True response to state- 
ments with socially desirable scale values and of a False response 
to statements with socially undesirable scale values was obtained. 
The average probability of a socially desirable response was found 
to be .81 and the probability provides an estimate of the theoretical 
relative frequency with which a group of subjects may be expected 
to give socially desirable responses to a given set of personality 
statements. The corresponding expectancy for a socially undesirable 
response is estimated as 1 — .81 = .19. 

If the tendency to give socially desirable responses is operating 
with respect to a personality scale, then the estimated probability 
of a socially desirable response, 81, can be used to predict the 
expected or mean score on the scale, provided we also know how 
many of the items in the scale are keyed for socially desirable 
responses. Suppose, for example, that a scale contains 40 items an 
that all of the items are keyed for socially desirable responses. The? 
the expected or mean score on the scale should be .81 X 40 = 324. 
If 30 of the items are keyed for socially desirable responses and 10 
are keyed for socially undesirable responses, then the expected or 
mean score should be .81 X 30+ .19 X 10 = 24.3 + 1.9 = 262. 

In terms of a general equation, we have 


Е = PX, + (1 — P)Xs ( 


where E is the expected mean, P is the probability of à socially 
desirable response, Х is the number of items keyed for socially 


E ALLEN L. EDWARDS т 


able responses, and X; is the number of items keyed for socially 
able responses. Equation 1 may also be written 


E = (Pp + Оч) [2] 


6 nis the number of items in the scale, P is the estimated 
bility of a socially desirable response, p is the proportion of 
in the scale keyed for socially desirable responses, Q — 1 — P, 
=1— p. 
Social desirability scale values for the items in the MMPI have 
n obtained by Heineman (1953). The value of X; for each of 43 
IPI scales was found by counting the number of items in each 
е which had socially desirable scale values and which were 
True, and the number of items with socially undesirable 
€ values which were keyed False. The value of Xs for each 
Stale was then obtained by subtracting X, from the total number 
Of items in the scale. The values of X ı and X; were then substituted 
n Equation 1 with P = .81 to obtain the expected mean for each of 
43 MMPI scales. To determine how well Equation 1 would 
ct the observed or actual means on the MMPI scales, the 
cted means were compared with the observed means based 
1 a sample of 155 male subjects originally tested by Merrill and 
ers (1956). 


Results 


ure 1 shows the plot of the observed means against the ex- 
means for 42 of the 43 MMPI scales. The plotted point for 
Scale is not shown in the figure because the point fell in the 
Sme upper right-hand corner beyond the limits of the graph as 
; for publication purposes. The omitted scale is Barron’s 
strength (Es) scale. The coordinates of the point for the Es 
47.64 and 49.40, and it is clear from these values that the 
is in line with the general trend of those shown in the figure. 
product-moment correlation between the observed and ex- 
Means is .93. For the sample studied, it is quite evident that 
lon 1 provides a fairly good estimate of the mean score on the 


y as dbbreviated code names for the 43 MMPI seales, as given by Dahl- 
Td Welsh (1960), are as follows: L, F, K, Hs, D, D-S, D-O, Ну, Hy-S, 
qu» Pd-8, Pd-O, Mi-m, Pa, Pa-S, Pa-O, Pt, Sc, Ma, Ma-S, Ma-O, Si, Dy, 
0, Pv, Do, Re, A, R, Ad, Dn, Es, Eo, No, Nu, Pn, Cn, Ca, Ne, and B. 


74 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


30 


ALES 
N 
o 
. 


OBSERVED MEANS OF MMPI SC 


P 5 n 15 20 25 30 
SD PREDICTED MEANS OF MMPI SCALES 


Figure 1. Plot of observed means of 42 MMPI scales against the predicted: 
means for the same scales based upon social desirability considerations. 


43 MMPI scales. It seems reasonable to believe that the equation 
would give equally good predictions of the means of other person” 
ality scales of the True-False type. 


Discussion 

In Equation 1, the value of P is based upon an estimate of the 
average probability of a socially desirable response for a given gro 
of subjects. The equation thus results in a prediction of the expected 
mean score for a given group of subjects. It has been argued PJ 
Edwards (1957) that individuals differ in their tendencies to givi 
socially desirable responses and the 39-item SD scale is regarded 
a measure of these individual differences. For the sample of 
males investigated, we may note that the mean score on the 89-1 Е 
SD scale, expressed as а proportion of the total number of items 


ALLEN L. EDWARDS 76 


the seale, is 30.79/39 — .79, and this may also be regarded as an 
estimate of the parameter P in Equation 1. The similarity between 
the two estimates, .81 and .79, based upon independent samples, 
suggests that an individual's score on the SD scale expressed as a 
proportion of the total number of items in the seale could be sub- 
stituted for P in Equation 1 to obtain predictions of individual 
expected scores on the scales. The estimates of P, for individuals, 
would not be expected to be as reliable as the estimated average 
value of P for a given group and thus the individual predictions of 
scores would not be expected to be as accurate as the prediction of 
means, 

The results of the present study also have some bearing upon the 
contention of Fricke (1956), Jackson and Messick (1958), and 
Couch and Keniston (1960) that scores on MMPI scales are in- 
fluenced considerably by acquiescent tendencies. It is of importance 
to stress that in Equation 1, the values of X4 and Хз are the number 
of keyed socially desirable and socially undesirable responses in a 
Seale and no£ the number of keyed True and False responses. Simi- 
larly, P is not the average probability of a True response but rather 
the average probability of a socially desirable response. However, 
if there is a measurable tendency for subjects to acquiesce or respond 
True to personality items, then it should also be possible to obtain 
an estimate of the average probability of a True response, inde- 
Pendently of the items in the MMPI. Substitution of this estimate in 
Equation 1 along with the number of keyed True and keyed False 
items in an MMPI scale would then provide a prediction of the mean 
E on the scale based upon aequiescent tendencies. If these pre- 

icted means were found to be correlated with the observed means 
on the MMPI scales, this would provide support for the hypothesis 


at scores on the MMPI scales are influenced by acquiescent 
tendencies, 


Summary 


Em earlier study, the probability of a True response to à per- 
Ко теа was found to be linearly related to the social 
earlier uity scale value of the statement. Using the data from the 
Bur dr the average probability, P, of a socially desirable 
desir БП defined as a True response to an item with a socially 

able scale value or a False response to an item with a socially 


16 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


undesirable scale value, was estimated as .81. The average proba- 
bility, Q = 1 — Р, of a socially undesirable response was estimated 
as .19. 

An equation was presented in which the values of P and Q were 
applied to the number of items keyed for socially desirable re- 
sponses and the number of items keyed for socially undesirable 
responses in each of 43 MMPI scales to obtain predicted means for 
the scales. The predicted means were found to correlate .93 with the 
observed means based upon a sample of 155 males. 


REFERENCES 


Couch, A. and Keniston, К. “Yeasayers and Naysayers: Agreeing 
Response Set as a Personality Variable.” Journal of Abnormal 
and Social Psychology, LX (1960) , 151-174. : 

Dahlstrom, G. G. and Welsh, G. S. An MMPI Handbook. Min- 
neapolis: University of Minnesota Press, 1960. er. 

Edwards, A. L. “The Relationship Between the Judged Desirability 
of a Trait and the Probability That the Trait Will Be En- 
sn Journal of Applied Psychology, XXXVII (1953), 90- 


93. (a) 
Edwards, A. L. Manual for the Edwards Personal Preference 
Schedule. New York: Psychological Corporation, 1953. (b) _, 
Edwards, A. L. The Social Desirability Variable in Personality 
Assessment and Research. New York: Dryden Press, 1957. 
Edwards, A. L., Heathers, Louise B., and Fordyce, W. E. “Correla- 
tions of New MMPI Scales with Edwards SD Scale.” Journal of 
Clinical Psychology, XVI (1960) , 26-29. 

Fordyce, W. E. “Social Desirability in the MMPI.” Journal of 
„Consulting Psychology, XX (1956) , 171-175. 

Fricke, B. С. “Response Sets as a Suppressor Variable in the OAIS 
me Мир 2° Journal of Consulting Psychology, XX (1956), 

Heineman, C. E. “A Forced-Choice Form of the Taylor Anxiety 
eae Journal of Consulting Psychology, ХҮП (1953), 441- 

Jackson, D. N. and Messick, S. “Content and Style in Personality 
Assessment.” Psychological Bulletin, LV (1958), 243-252. 

Merrill, В. M. and Heathers, Louise B. “The Relation of the MMPI 
to the Edwards Personal Preference Schedule on a College 


Counseling Center Sample.” Journal of Consulting Psychology 
XX (1966) UO MET Cine Pey 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vou. XXII, No. 1, 1962 


WHAT DO PHYSICAL FITNESS TESTS MEASURE?— 
A REVIEW OF FACTOR ANALYTIC STUDIES* 


DELMER C. NICKS 


San Fernando Valley State College 
AND 
EDWIN A. FLEISHMAN 


Yale University 


Turre is, today, considerable interest in developing and main- 
taining the physical proficiency of our manpower resources. Pro- 
grams of physical education are an integral part of curricula in 
Public and private schools, colleges and universities. Such programs 
extend through all levels of training in the Armed Forces and to their 
military academies. It is possible that individual students spend 
More time in physical education programs than in any other single 
Program during their school careers. 

The President has appointed a special committee on physical 
education to study and advise him on the problem. Practically 
every leading magazine? during the past few years, has carried a 
feature story pointing up the relatively low level of performance 
of American youth on physical proficiency standards. One reason 


їч 
of we 18 study was supported under Contract Nonr 009(32) between the Office 


val Research and Yale University. While the article was written by the 
nd author, much of the review was done by Dr. Nicks while he was Re- 
- Sociati, ciate on this project in the summer of 1958. He continued his as 
Alley Star th the project when he returned to his position at San Fernando 
February 40 College, Northridge, California, His sudden and tragic death in 
| соеди» 1959, at the age of 29, saddened his many friends, students, and 
! Psychologists removed from the scene one of our most promising experimental 


2 poPlmar Kremer provided valuable assistance in the conduct of this review. 
August, Каре, “The Report That Shocked the President,” Sports 1 ch 
Report, Кече Youth Physically Fit?", U. S. News and 


77 


78 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


for the increasing publie and official concern has been a series 
of studies which report that European youth is far superior to 
American youth in general physical proficiency (Kraus & Hirsch- 
land, 1953, 1954). These conclusions are based on six tests (the 
“Kraus-Weber Tests”). There is ample evidence, however, that 
these tests sample only two or three of many ability factors in 
physical proficiency. In one study, Hempel and Fleishman (1955) 
identified eight factors and there are indications from other research 
that still additional factors must be considered in any comprehensive 
evaluation of physical proficiency. Clearly, we need to know what 
physical proficiency factors need to be assessed and we need to 
know what tests are the best measures of these factors. 


Purpose 


This report presents a critical review and integration of previous 
factor analysis studies in this area. The review indicates the range 
of factors which have been identified as well as the kinds of tests 
which seem to measure them. It is hoped that this review will pro- 
vide a) a framework for subsequent research into the dimensions 
of physical proficiency and b) a rationale for the development of a 
comprehensive battery of physical proficiency measures. 


Nature and limitations of the review 


Most of the studies reviewed are in the physical education litera- 
ture. Many of these studies are concerned with building and assess- 
ing short batteries of tests which will correlate with a longer, more 
comprehensive, battery of tests. Thus, they are more concerned 
with increasing the efficiency of the testing process than with identi- 
fying basic abilities. This goal, however desirable, results in serious 
methodological problems when factor analysis is applied. For ex- 
ample, factor analyses in this literature often include as variables 4 
number of composite scores from short batteries of tests. In some 
articles over half the variables are of this nature. Frequently, these 
analyses include course grades as variables in order to determine 
what factors contribute to success in the course. Finally, many of 
the individual tests are themselves exceedingly complex. It can be 
seen why factor resolution, in many cases, has been difficult to 
achieve in these studies, 


Other difficulties stem from the analyses themselves. In several 


NICKS AND FLEISHMAN 79 


studies factor extraction and/or factor rotations were stopped too 
soon. This compounded the difficulty of comparing factors across 
studies, except in the area of strength tests, which was the best 
defined. In some studies it appeared that an oblique or hierarchical 
factor solution might have been more appropriate for the data. This 
was particularly true of analyses made within a highly delimited 
area where all of the tests were highly intercorrelated (for example, 
dynamometer strength tests). In several cases, we were able to 
extract additional factors or make additional rotations of the 
original author’s data in attempts to improve factor resolution and 
clarify interpretations, 


Procedure 


A card file of factors was made to include a card for every factor 
in each study reviewed, Hach card contained the test loadings for 
that factor. Similar factors were then sorted into piles as an aid in 
comparing factors. Inspection of the tests in common made it pos- 
sible to identify some factors with different names as essentially the 
same. In some cases, factors given the same name were really quite 
different. Consequently, in the review to follow, factors sometimes 
Were given names other than those used by the original investigator. 
For the most part, however, original faetor names held up across 
Studies. Where different factor names were provided by different 
authors, we used the name which we felt was most descriptive 
Operationally. 


Factor Areas 


It should be stressed that, despite the cautions and difficulties 
described above, there was considerable consistency in a number of 
the factors which emerged. Furthermore, these seem to fall into 
Several broad areas of ability. We will describe these factors, point 
Sut the tests which seem to measure them, and discuss some ques- 
tions raised by these findings. 


Strength Area 


By far the most clearly defined area in the factor analysis litera- 
AN У the area o "strength." When the intercorrelations among 
h of strength are factored, three broad factors emerge repeatedly. 
ese factors are Explosive Strength, Dynamic Strength, and Static 


80 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Strength. There appears. to be some correlation between these 
factors, though the correlation is not high. In studies of physical 
fitness tests where these three factors did not emerge, there was 
usually a “general strength" factor. This occurred when there were 
not enough strength tests to define the three separate factors. 

Let us examine these three factors more thoroughly. 

a) Explosive Strength. This factor was identified more often than 
any of the others (see e.g. Brogden, Burke & Lubin, 1952; Coleman, 
1940; Cumbee & Harris, 1953; Harris, 1937; Hempel & Fleishman, 
1955; Hutto, 1938; McGraw, 1949; Rarick, 1937; Shapiro, 1947). 
In addition, analyses by Carpenter (1941), Cousins (1955), High- 
more (1956), Larson (1941), McCloy (1940, 1956), Phillips (1949), 
and Seashore (1942) yielded factors which can be interpreted as 
Explosive Strength. This factor appears to emphasize the ability to 
exert maximum energy in one explosive act. It has been called Energy 
Mobilization or Power or Velocity in some studies. The purest tests 
of this factor include standing broad jump, vertical jump, and 
medicine ball put. Shot put has a loading on this factor almost as 
high as medicine ball put, but also loads significantly on other 
strength factors. The common feature of tests of Explosive Strength 
is that one is required to jump or to project oneself or to project 
some object as far or as high as possible. The factor is distinguished 
from other strength factors in requiring one short burst of effort, 
rather than continuous stress or repeated exertion. 

Short runs, dodging runs, shuttle runs, etc., often have appreciable 
loadings on this factor. This is probably due to the push-off type 
motions involved in many of these tasks. It seems reasonable that 
in these shorter sprints, a runner’s time is increasingly attributable 
to the speed with which the runner can “get off the blocks.” This is 
consistent with our notion of the Explosive Strength factor. 

There is some evidence that there are separate, though highly 
correlated, Explosive Strength factors for arms and legs (Cumbee 

. & Harris, 1953; Rarick, 1937). These factors appeared separately, 
together with a general Explosive Strength factor, in one of the 
most careful hierarchical solutions (Brogden, Burke & Lubin, 1952) 
and in a recent unpublished study factored by Nicks, The Explosive 
Strength-Arm factor was defined by throws, puts, ete., while the 
Explosive Strength-Leg factor was defined by various jump tasks- 

b) Dynamic Strength. This factor has appeared in the literature 


PCM Ы 76. 


z ig set doi 

zt NICKS AND FLEISHMÁN ЗУ 81 
alr Ost as frequently as the preceding factor (Brogden, Burke & 
"Lubin, 1952; Cousins, 1955; Cumbee & Harris, 1953; Hempel & 
Fleishman, 1955; Larson, 1940, 1941; McCloy, 1956; MeCraw, 1949; 
“Metheny, 1938; Seashore, 1942; Shapiro, 1947). It sometimes has 
_ been called Velocity or Speed, but these names are somewhat mis- 
— leading. Dynamic Strength seems to involve the strength of muscles 
in the limbs in moving or supporting the weight of the body re- 
-peatedly over a given period of time. The best tests for this factor 
__ em to be pull-ups (chins), rope climb, and dips. Dips require the 
“Subject to suspend himself between parallel bars with arms rigid; the 
Subject lets himself down and pulls himself up as many times as 
) Possible. A critical aspect of this factor appears to be the require- 
‘ment that the muscular force must be repeated as many times as 
‘Possible, with a consequent progressive decrement in the force 
Which ean be exerted. Individual differences in this ability are 

ely a function of how many repetitions can be made. 
Thus far, most of the tests defining this factor involve arm 
Muscles. There is some evidence, however, for a separate Dynamic 
- Strength factor involving the trunk muscles (Hempel & Fleishman, 
_ 1955; Phillips, 1949). Situps, leg lifts, and push-ups are examples of 
Wests loading on this factor. There is the further possibility of 
‘Separate arm and leg factors, although separate factors were not 
isolated in any of the studies reviewed. The reason for this may be 
at none of these studies included any tests that could be expected 
efine a leg factor. The appearance of moderate loadings for short 
nts on this factor does suggest the possibility of a separate 

h correlated leg factor (e.g., McCloy, 1956). It would be a 
htforward experiment to check this hypothesis. One study 
‘Separate factors for arm extensors and arm flexors (Brogden, 
Є Lubin, 1952). However, these factors were very highly 
Й ated. In any case, there appears to be a cluster of correlated 
“Actors in this area which need more precise definition. 
) Static Strength. This third, broad strength factor has emerged 
In Several studies (Carpenter, 1941; Cureton, 1947; Harris, 
Larson, 1940, 1941; Phillips, 1949; Rarick, 1937; Sills, 1950). 
rest tests of Static Strength appear to require an exertion of a 
pn force for a brief period of time where the force here is 
х Continuously up to a maximum. Typically, the force is 
Against a fairly immovable object, such as a dynamometer. 


82 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


This contrasts with Explosive and Dynamic Strength where there 
is substantial movement of the body or limbs. Furthermore, in 
Dynamic Strength the force must be repeated in successive move- 
ments and in Explosive strength the strain on the muscle is not 
continuous. Tests which have defined Static Strength include dy- 
namometrical tests applied to hand grip, as well as to arm, back, 
and leg muscles. 

In an unpublished study, Nicks factored a small correlation matrix 
of dynamometrical tests provided by McHone, Tompkin, and Davis 
(1952). While he found some evidence for separate factors repre- 
senting various functional parts of the body, most of the tests turned 
out to be complex factorially and the factors were not well differ- 
entiated. It appears that it is probably not worthwhile to try to 
isolate a number of separate factors in the Static Strength area. It is 
of interest to note that before the application of factor analysis to 
these problems, test batteries of physical proficiency often placed 
considerable emphasis on different tests of static strength. The lack 
of correlation of Static Strength with the Dynamic and Explosive 
Strength factors, together with the greater practical implications of 
these latter factors for significant human activities, would argue 
against such overemphasis on tests of Static Strength. 


Flexibility—Speed Area 


Another ability area which seems distinct from Strength has been 
termed Flexibility. Tests of this factor appear to require the muscles 
involved to endure strain or distortion, with some emphasis on rapid 
recovery from this strain allowing an immediate repetition of the 
movement. There is evidence for separate Flexibility factors for the 
limbs and for the trunk. For example, Hempel and Fleishman (1955) 
found tests of kicking height and leg bends on a “Limb Flexibility” 
factor distinct from a “Trunk Flexibility" factor. 3 

a) Extent Flexibility. An alternative breakdown of factors in this 
area is “Extent Flexibility" versus “Dynamic Flexibility.” The 
Hempel and Fleishman factors may be interpreted as Extent Flexi- 
bility. Tests of Extent Flexibility emphasize the ability to move 9 
stretch the body, or some part thereof, as far as possible in various 
directions, For example, a person who could perform yoga exercises 
would score extremely high on this factor. 

b) Dynamic Flexibility. Tests of the Dynamic Flexibility facto" 


involve the ability to make repeated flexing or stretching movements 
(where the extent of the movements is either short or long). Ex- 
amples of such performances are squat twist and deep knee bends. 
"This section has been called “Flexibility—Speed” because analyses 
of physical fitness tests frequently reveal a correlated cluster of 
factors which emphasize both flexibility and speed of bodily move- 
ments and it is difficult to separate them. Thus, factors called 
"Speed of Limb Movement,” “Speed of Change of Direction" some- 
limes emerge in analyses of such tests (Brogden, Burke & Lubin, 
1952; Cumbee, 1954, Cumbee, Meyer & Peterson, 1957). One hypoth- 
esis is that a hierarchical factor structure might best describe this 
area. The most general factor would be a “General Flexibility— 
Speed” factor, Contributing to this would be two broad second order 
factors, Extent Flexibility and Dynamic Flexibility. Dynamic Flex- 
ibility which we have defined above, may be the same as a Speed of 
Bodily Movement factor identified elsewhere. Finally, there would 
be fairly narrow factors such as Speed of Limb Movement, Speed of 
Change of Direction, and, perhaps, a Run factor. It is possible that 

. Some of these might break up into specific limb factors. Figure 1 
diagrams the structure which might be found. 

Some support for this interpretation is found in an unpublished 
analysis by Nicks of data provided from МеНопе, Tompkin, and 
Davis (1951). And these factors have been identified in separate 
analyses by others. For the present, we will give the tentative defi- 

* hitions of the factors which have been found in the area we call 
amie Flexibility.” On occasion, the term “Velocity” has been 
Applied to one or all of these factors in the previous literature. 

1. Speed of Change of Direction. This factor is defined by tests in 
Which the subject must quiekly ehange direction, usually while 
running (Brogden, Burke & Lubin, 1952; Cumbee, 1957; Cumbee & 
E Es Larson, 1941; Phillips, 1949; Shapiro, 1947; Wendler, 

257. Shuttle runs, dodging runs, and potato races load highly on 
T factor. Some investigators have preferred the name “agility” 

5&, Cumbee & Harris, 1953; Larson, 1941; McCloy, 1940; McCloy 
ош, 1954; Shapiro, 1947). The relation of this factor to agility 

and to Tuns tests needs to be established. 

8 оны Speed. This factor has been identified repeatedly as 

non to short and long dashes (Brogden, Burke & Lubin, 1952; 

715 1955; Highmore, 1956; McCloy, 1956; Sills, 1950; Wend- 


ы - 
K NICKS AND FLEISHMAN 8 
r 


84 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


"ware poodg 
-Appamon әчү Surquosop amyonns 10498} [vorqoreiemg epquesod ү > emn 


£ 
SEAS 


a9ueug Jo peedg зо peads 


(quemeaom ^ттрок 
go peedg) Кут 
-ттатхәта оүшеиќа 


peads 
-Хуүттатхәта 
тзләпәр 


NICKS AND FLEISHMAN 85 


ler, 1938). There is evidence that this factor correlates with the 
Dynamic Strength factor previously described. There is some evi- 
denee that run tests also correlate with endurance factors. For the 
present we include it here on a logical basis, although its status is 
not clear. A question that needs answering, for example, is whether 
tests like Shuttle Run (which loads on a Speed of Change of Diree- 
tion factor) measures anything different from or additional to 
straight long or short dashes. The literature treats them as measures 
of separate abilities, but future research will clarify this practice. 

3. Speed of Limb Movement. This is the ability to move the arms 
or legs as rapidly as possible, where skill is not involved. Thus, a 
Speed of Arm Movement factor has been found (Cumbee, 1953, 
Cumbee, Meyer & Petersen 1957; Fleishman, 1954, 1958; Fleishman 
& Hempel, 1954, 1956) in tasks requiring the subject to strike two 
plates with a stylus, or to break photoelectric beams with rapid arm 
movements. 


Balance Area 


The factor structure in this area is not well defined since very few 
studies have included more than one or two balance tests. Not many 
tests of balancing ability have been developed. However, the studies 
which did include some balance tests furnish some suggestions of 
factors that might appear here (Bass, 1939; Carpenter, 1941; Cum- 
bee, 1953, Cumbee, Meyer & Peterson, 1957 ; Hempel & Fleishman, 
1955). There is evidence for separate static and dynamic balance 
factors. These have been called Equilibrium Balance and Perform- 
ance Balance (Hempel & Fleishman, 1955). There is also some in- 
dieation that balancing ability may be related to whether the eyes 
are open or not (Bass, 1939). This should be quite easy to test. The 
relationship of these io a "Kinesthetie Diserimination" factor 
Measured by "tilting chair" tests (Fleishman, 1954) needs to be 
determined, Finally, one study (Cumbee, Meyer & Peterson, 1957) 
elated 4 “Balancing Objects” factor in each of two studies, The 
definitions of the balance factors found by our review follow. 

2 Static Balance: This factor seems to represent the ability to 
ف‎ bodily equilibrium in some fixed position. Often this posi- 
i ed be an unusual one. Tests requiring the subject to stand on 
) b) is or to stand on a rail have loaded on this factor. 

ynamic Balance: Tests of this factor require the subject to 


86 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


maintains balance while performing some task (for example rail 
walking). It is likely that good tests of this factor might require the 
subject to balance himself on a very unstable object like a large ball. 

c) Balancing Objects: The name of this factor is self-explanatory. 
Tests of this factor would involve balancing a yardstick on the end 
of the finger, or a ball on the back of the hand, etc. 


Coordination Area 


An area of physical proficiency which would appear distinct from 
strength, speed, flexibility, and the other factors mentioned, and 
important in its own right, is that of coordination. Yet correlational 
studies have failed to show up an ability which could be labeled with 
confidence as “general coordination.” An additional question of in- 
terest is whether there are several types of coordination. 

a) Multiple Limb Coordination: Fleishman (1956, 1958), Fleish- 
man and Hempel (1956), and Parker and Fleishman (1959) have 
identified a factor which they call “Multiple Limb Coordination” in 
analyses of perceptual-motor abilities. This factor is common to 
psychomotor tasks in which the subject must coordinate the simul- 
taneous movements of two hands, two feet, or hands and feet in 
operating various devices. Shapiro (1947) also found such a factor 
in psychomotor tasks. However, tests of this factor do not correlate 
very much with physical proficiency types of tasks (Adams, 1953; 
Shapiro, 1947). So, it appears that the kind of coordination empha- 
sizing simultaneous use of several limbs in operating equipment is 
not the same kind of coordination as might be involved in athletic- 
type tasks. In Fleishman’s studies with psychomotor devices, the 
subject is seated or standing in one place and is not required to move 
his whole body. Perhaps the critical distinction is that movement of 
the whole body is not involved in the kinds of tasks which appear 0? 
the “Multiple Limb Coordination” factor. 

b) Gross Body Coordination: Cumbee (1953), Cureton (1947), 
Hempel and Fleishman (1955), Larson (1941), and Wendell (1938) 
did identify a factor they called “Gross Body Coordination,” which 
seemed to emphasize more gross activity of the whole body (e-£« 
hurdling and jumping tasks). Perhaps this is the same factor which 
others have called “Agility.” The question is not yet answered, but 
it is worth checking in future studies. 


NICKS AND FLEISHMAN "REEL E 


This is not to say that several coordination factors have not, been c 
identified in the physical fitness area. Such factors have been found Д s 
but are poorly defined. For example, a general factor often labeled 
“Gross Body Coordination” can be expected to appear when a f 
number of complex sports skill tests (e.g., ball catching, soccer . 
kicking) are included in a larger battery (e.g., Cumbee, 1953, 1957; * 
Wendell, 1938). However, this tells us little about the precise nature. 
of this factor or its possible components. The distinction between 
this factor and one called “Motor Educability” (e.g., Larson, 1941; 
McCloy, 1938; Metheney, 1938; Wendler, 1938) is not clear. Some 
(e.g., McCloy, 1938) have viewed this factor as representing a kind 
of physical fitness IQ general to tasks requiring large-muscle co- 
ordination. The best measures of this factor are the Brace Test and 
the Johnson Test. Both of these are composites of many subtests 
emphasizing tumbling, hops, stunts, balance, etc. It is possible that 
this factor taps some kind of "understanding of what has to be 
done" in a complex skilled motor performance. However, it is likely 
that the “Motor Educability” factor would break up into com- 
ponents in suitably designed studies, as MeCloy himself implies 
(MeCloy, 1954). 

The findings of Fleishman on the definition of the Multiple Limb 
Coordination faetor suggest that this ability depends on central or 
cortical nervous system activity. This inference is made from the 
fact that people who do well on two-hand coordination tasks, also 
do well on foot-hand, and two feet coordination tasks. For example; 
no separate factor confined to two-hand activity was found. The 
generality of the “Gross Body Coordination” factor, whatever its 
Precise definition, would seem to imply an emphasis on central 
factors independent of body members or particular muscle groups. 
Ir this is true, then it will be difficult to develop pure tests of this 
kind of Coordination, just as it has been difficult to develop pure 
tests of Multiple Limb Coordination. However, future studies 
Should be directed at a better definition of this ability area with 
the subsequent attempts to develop some tests which emphasize 
ordination and minimize strength, flexibility, balance, etc. Of 
ourse, it may turn out that the essence of coordination is the ability 

Integrate the Separate abilities in a complex task. Analyses of 
= batteries containing “coordination” tests varied in specific ways 
ould clarify these questions, 


88 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Endurance Area 


Several studies with physical fitness tests have isolated a factor , 
labeled Endurance (Brogden, Burke & Lubin, 1952; McCloy, 1956) 
or one which could be so interpreted (Cousins, 1955). MeCloy 
(1956) identified several different endurance factors, but on close 
inspection two of these turn out to be more like our definitions of 
Dynamie Strength and Explosive Strength; his third factor may 
be Endurance. In a typical case (McCloy, 1956) the factor was de- 
fined by long runs and “drop-off” scores. The “drop-off” score was 
computed as a ratio of an individual’s speed for long and short 
runs; the assumption is that the greater the difference between long 
and short run performance, the poorer the “endurance.” Of course 
the inclusion of such scores in the same analyses with the long run 
scores would yield a spurious factor common to these scores. Thus, 
the factor labeled “endurance” in these studies may be nothing more 
than a specific “run factor." 

However, there is some evidence that an endurance factor may 
extend beyond run tests to other tests requiring subjects to perform 
over time. Thus, tests such as dips and pull-ups, when scored in 
terms of “number completed,” may load on a factor with running 
tests. Whether this means that long runs involve limb strength or 
that both long runs and certain strength tests depend on a separate 
“endurance” factor remains to be shown. It would be possible to 
explore this by giving some strength tests as "endurance" tests (e£ 
do as many pull-ups as possible) and as timed tests (do as many 88 
you ean in 30 seconds) and examining their relationships to other 
endurance type tests (e.g., long runs). Is there any variance besides 
strength, which we might label endurance, in the first type of pull-up 
test which is not in the timed version? 

In a recent unpublished study by Nicks, an endurance factor was 
tentatively identified as common to “leg raiser” and “bent arm 
hang” tests. The first test requires the subject to lie on his back 
and hold his legs 12 inches off the ground for as long as possible. 
In the second test the subject pulls himself up until his eyebrows 
are even with the chinning bar; he holds this position as long as he 
can. This factor appeared separate from a Dynamic Strength factor 
defined by a traditional “pull-up” test and a “dips” test, but it was 
highly correlated with it. 

The question to be answered is not whether “endurance” is im- 


NICKS AND FLEISHMAN 8p; 


portant in such tests, but whether it is necessary to postulate a _ 
separate “endurance” ability over and above the strength factors 
‘previously described. Such an ability, for example, might emphasize 
the capacity to maintain maximal effort over time, In our present 
state of knowledge we should allow for this possibility in future 
studies. Of special interest is the relationship of endurance to the 
strength area and the possibility that several endurance factors may 
exist. In the former instance it will be recalled that an alternative 
name for the “Explosive Strength” factor has been the name “En- 
ergy Mobilization.” 


Conclusion " 


This review has described fourteen factors of physical proficieney 
identified from previous research. Other possible factors which 
might be discovered were also described. A number of questions 
Were raised regarding the structure of skill in this area, and sug- 
gestions were made for future studies to answer these questions. 
Several things are clear. There is no such thing as general “physical 
Proficiency.” The problem is a multi-dimensional one. It is also 
clear that previous studies comparing American youth with youth 
of other countries have assessed only a small number of the factors 
already identified, From the foregoing discussion, for example, it 
would appear that the Kraus-Weber tests measure mainly the Ex- 
tent Flexibility (Trunk) and Dynamic Strength (Trunk) factors. 

Ааа follow-up to this review, several large scale studies will be 
conducted. An attempt will be made to include representative meas- 
ures of these factors and to administer them to large samples of 
Subjects. The objective is to answer some of the questions raised 
about the structure of physical proficiency, to clarify some of the 
factor definitions, and to identify new factors which might emerge. 
Eventually, a battery of basic reference tests should be developed, 
Which would provide comprehensive coverage of abilities in this 
Dies Buch measures would also allow an assessment of the relative 
contributions of the component abilities to a variety of different, 
more complex, athletic performances. 

b In the meantime, an outline and description of tests which might 
¢ included in such studies has been prepared.? Some are well known 


ать; : 
ber DN Cutline and description of tests has been deposited as Document num- 
with the ADI Auxiliary Publications Project, Photoduplication Serv- 


90 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tests, but others are new ideas. This outline also provides an interim 
report of what abilities such tests probably measure. 


REFERENCES 


A. Reports of original factor analysis studies. E 

Bass, R. I. “An Analysis of the Components of Tests of Semi- 
circular Canal Function and of Static and Dynamic Bal- 
ance.” Research Quarterly, X. (1939) , 38—52. д 

Brogden, H., Burke, L., and Lubin, А. “А Factor Analysis of 
Measures of Physical Proficiency." Department of the Army. 
Personnel Research Section, PRS Report 937, 1952. |, 

Carpenter, A. “A Critical Study of the Factors Determining 
Effective Strength Tests for Women.” Research Quarterly, 
IV (1937), 3-32. 

Carpenter, A. “An Analysis of the Relationships of the Factors 
of Velocity, Strength, and Dead Weight to Athletic Per- 
formance.” Research Quarterly, XII (1941), 34-39. 

Coleman, J. W. “The Differential Measurement of the Speed 
Factor in Large Muscle Activities.” Research Quarterly, 
VIII (1937), 123-133. Р 

Cousins, О. Е. “А Factor Analysis of Selected Wartime Fitness 
Tests.” Research Quarterly, XXVI (1955), 277-288. , 

Cumbee, F. Z. “A Factorial Analysis of Motor Co-ordination. 
Research Quarterly, XXV (1954), 412-428. 

Cumbee, F. Z. and Harris, C. W. “The Composite Criterion and 
its Relation to Factor Analysis." Research Quarterly, xx 
(1953), 127-134. ‘ 

Cumbee, Ё. Z., Meyer, M., and Peterson, С. “Factorial Analysis 
of Motor Coordination-Variables for Third and Fourth Grade 
Girls.” Research Quarterly, XXVIII (1957), 100-108. . 

Fleishman, E. A. “Dimensional Analysis of Psychomotor Abil- 
ities.” Journal of Experimental Psychology, XLVIII (1954); 
437-454. 

Fleishman, E. A. “A Comparative Study of Aptitude Patterns 
in Unskilled and Skilled Psychomotor Performances.” Jour- 
nal of Applied Psychology, XLI (1957), 263-272. 

ЖЕШ; А. UBRO Analysis of Movement d 

ions." Journal o; erimental LV ! 
430-453. f Етр Psychology, 

Fleishman, E. A. and Hempel, W. E. “Factorial Analysis of 
Complex Psychomotor Performance and Related Skills. 
Journal of Applied Psychology, XL (1956), 96-104. ” 

Hall, D. and Wittenborn, J. R. “Motor Fitness of Farm Boys: 
Research Quarterly, XIII (1942) , 432. 

Harris, J. E. “The Differential Measurement of Force and Ve- 


Ane ас RAA e. 


ice, Library of Congress, Washington 25, D. C. A copy may be secured ВУ 
citing the Document number and remitting $2.50 for photoprints and $1.75 °F 
35 mm. microfilm. 


NICKS AND FLEISHMAN 91 


locity for Junior High School Girls." Research Quarterly, 
VIII (1937), 114-121. d 

Hempel, W. E. and Fleishman, E. A. “Factor Analysis of Phys- 
ical Proficiency and Manipulative Skill." Journal of Applied 
Psychology, XXXIX. (1955), 12-16. y 

Highmore, G. “A Factorial Analysis of Athletic Ability.” Re- 
search Quarterly, XXVII ( 1956), 1-11. 

Hutto, L. E. “Measurement of the Velocity Factor and of 
Athletie Power in High School Boys." Research Quarterly, 
IX. (1938) , 109-198. Ў 

Larson, A. “А Factor and Validity Analysis of Strength Vari- 
ables and Tests with a Test Combination of Chinning, Dip- 
hing, and Vertical Jump.” Research Quarterly, XI (1940), 

6 


Larson, A. “A Factor Analysis of Motor Ability Variables and 

ded pu Tests for College Men." Research Quarterly, XII 
‚ 499. 

McCloy, C. H. “The Measurement of General Motor Capacity 
and General Motor Ability." Supplement, Research Quar- 
terly, V (1934), 46-61. 

McCloy, C. H. “The Measurement of Speed in Motor Perform- 
ance,” Psychometrika, V (1940), 173-182. » 

McCloy, С. Н. “A Factor Analysis of Tests of Endurance. 
Research, Quarterly, XXVII (1956), 213-216. 

McCloy, E. “Factor Analysis Methods in the Measurement of 
Physical Abilities,” Supplement, Research Quarterly, VI 
(1935), 114-121 т 

Bs W. A Factor Analysis of Motor Learning.” Re- 

arch Quarterly, XX (1949), 316-335. 

MtHone, V. L., Toapkin, Q. W. and Davis, J. S. “Short Bat- 
teries of Tests Measuring Physical Efficiency for High 
School Boys.” Research Quarterly, XXIII (1952), 82-94. 
(Intercorrelation matrix for 19 tests given to 135 college 
Students, but no factor analysis. As mentioned in the review, 

18 matrix was factor analyzed by Dr. Delmer C. Nicks. 
Another factor analysis of the same matrix can be found in 

M Babee and Harris, 1953.) 

eeny, E. “Studies of the Johnson Test as a Test of Motor 


E J. F., Jr. and Fleishman, E. A. “Prediction of Advanced 


illips M “А Stud ; е : ts b 
27, y of a Series of Physical Education Tests by 
ne analysis,” Research Quarterly, XX. (1949), 60-71. 
Activities » Research Q 
B. ‹ Quarterly, VIII (1937), 89—105. 
Roggen, A. “A Study of the Relationships between the ‘General 


92 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


man's Method of Analysis for ‘G’.” Supplement, Research 
Quarterly, VI (1935), 122-127. 

Seashore, Н. G. “Some Relationships of Fine and Gross Motor 
Abilities.” Research Quarterly, XIII (1942), 260-274. — 
Shapiro, J. J. *A Factor Analysis of Twenty Tests for Pilots 
Given by the Army Air Force to West Point Cadets." Un- 
published Master's thesis, Department of Psychology, Uni- 
versity of Southern California, 1947. а 

Sills, Е. D. “A Factor Analysis of Somatotypes and of their 
Relationship to Achievement in Motor Skills." Research 
Quarterly, XXI (1950), 424—457. E. 

Wendler, A. J. “A Critical Analysis of Test Elements Used in 
Physical Education.” Research Quarterly, IX (1938), 64-76. 

. The following references do not include original factor analyses. 

However, they contain correlational, reliability, or normative 

data, literature reviews, or measurement suggestions relevant to 

this review. 

Adams, J. A. “An Evaluation of Test Items Measuring Motor 
Abilities.” USAF Personnel and Training Research Center. 
Research Report 56-55. Lackland Air Force Base, Texas, 
1955. (The correlations among physical fitness measures are 
presented and their lack of prediction of pilot success dem- 
onstrated.) ) 

Bookwalter, К. W. “A Survey of Factor Analysis Studies in 
Physical Education.” The Physical Educator, 11 (1942), 
209-212. 

Bookwalter, K. W. and Bookwalter, C. W. “A Measure of 
Motor Fitness for College Men." Bulletin of the School 0 
Education, Indiana University, XIX (1934), 5-16. (Inter- 
correlations of 17 motor ability tests, with over 900 college 
men as subjects.) 

Brace, D. K. “Studies in Motor Learning of Gross Bodily Motor 
Skills.” Research Quarterly, XVII. (1946), 242-253. 8 

Brown, Н. 8. “A Comparative Study of Motor Fitness Tests. 
Research Quarterly, XXV (1954), 8-19. (Includes a survey 

of factor analysis studies and intercorrelations of 28 physica 

education tests, using 208 college men as subjects.) 

cher, C. A. and Thompson, D. W. “The Relationship between 

the Physical Fitness Ratings of Aviation Cadets and Certain 

Early Life Experiences Pertinent to Physical Activity." Re 

search Quarterly, XXX (1959), 136-143. 

Clarke, H. H. The Application of Measurement to Health and 
Physical Education. New York: Prentice-Hall, Inc., 1946. 
(Includes an intercorrelation matrix for 5 strength variables: 

бане рле, Strength Tests of Affected Groups 12; 
volved in opedie Disabilities.” ; terli 
(1948), 118147. ilities.” Research Quarterly, 

Clarke, H. H. “Improvement of Objective Strength Tests of 
Muscle Groups by Cable-Tension Methods." Research Quar- 


rete бү (1950), 399-419. (Detailed directions and Pp} 
ures. 


Bu 


NICKS AND FLEISHMAN 93 


Clarke, H. H. "Relationship of Strength and Anthropometric 
Measures to Various Arm Strength Criteria.” Research Quar- 
terly, XXV (1954), 134-143. 

Clarke, H. H. “Relationships of Strength and Anthropometric 
Measures to Physical Performances Involving the Trunk 
and Legs.” Research Quarterly, XXVIII (1957), 223-232. 
(Includes an intercorrelation matrix of 16 strength variables.) 

Clarke, H. H., Bailey, T. L., and Shay, C. T. “New Objective 
Strength Tests of Muscle Groups by Cable-Tension Meth- 
ods.” Research Quarterly, XXIII (1952), 136—148. (De- 
tailed directions and pictures given.) 

Clarke, H. H. and Carter, G. H. “Oregon Simplification of the 
Strength and Physical Fitness Indices.” Research Quarterly, 
XXX (1959), 3-10. 

Cureton, T. K. Endurance of Young Men. Society for Research 
in Child Development Monograph, X (1945), No. 1. 

Cureton, T. K. Physical Fitness Appraisal and Guidance. St. 
Louis: C. V. Mosby, 1947. 

Cureton, T. K. and Larson, L. A. “Strength as an Approach to 
Physical Fitness." Research Quarterly, XII (1941) , 391-406. 

Fruchter, B. Introduction to Factor Analysis. New York: D. 
Van Nostrand, 1954. 4 

Gates, D. D. and Sheffield, R. P. “Test of Change of Direction as 
Measurements of Different, Kinds of Motor Ability in Boys 
of the 7th, 8th, and 9th Grades.” Research Quarterly, XI 
(1940), 136-147. (Deseription of a number of tests for 
change of direction, mostly dodging runs and obstacle runs, 

. With normative data.) 

Gire, E. and Espenschade, A. “The Relationship between Meas- 
ures of Motor Educability and the Learning of Specific Motor 
Skills.” Research Quarterly, XIII (1942), 43-56. (Intercor- 
Telations for Brace, Iowa-Brace, and Johnson test batteries.) 

Guilford, J. P, “A System of the Psychomotor Abilities.” Ameri- 
can Journal of Psychology, LXXI (1958), 164-174. (One 
way of classifying physical fitness and other psychomotor 

actors is presented.) 
unsicker, P. A, and Donnelly, J. “Instruments to Measure 
Strength.” Research Quarterly, XXVI (1955), 408-420. (De- 
scriptions of various dynamometers.) » 
aaa P. A. and Greey, G. “Studies in Human Strength. 
fo esearch Quarterly, XXVIII (1957), 109-122. 
M. L. M. A Factorial Analysis of Ability in Fundamental 
t otor Skills. Teachers College, Columbia University, Con- 
ributions to education, 1935. (Monograph series) (Not a 
Actor analysis in the contemporary sense. Intercorrelations 
E some of the common physical fitness tests are included.) 
nedy, F. T, “Substitution of the Tensiometer for the Dy- 
затотеќег in Back and Leg Testing.” Research Quarterly, 
„ХХХ (1959), 179-188. А 
“us, Н. and Hirschland, R. P. “Muscular Fitness and Health. 


94 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Journal of the American Association of Health, Physical 
Education, and Recreation, XXIV (1953), 10-17. Е 

Kraus, Н. and Hirschland, К. P. ^Minimum Muscular Fitness 
in School Children." Research Quarterly, XXIV (1954), 178- 
188. А 

Larson, І. А. "Some Findings Resulting from the Army Air 
Forces Physical Training Program." Research Quarterly, 
XVII (1946), 144-164. i 

Larson, L. A. and Yocom, R. D. Measurement and Evaluation 
in Physical Health and Recreation Education, St. Louis: 
C. J. Mosby, 1951. 

Leighton, J. R. “A Simple Objective and Reliable Measure of 
Flexibility.” Research Quarterly, XIII (1942), 205-216. (An 
application of goniometer type instruments to 13 different 
movements; reliabilities computed.) ; 4 

Mathews, D. K. Measurement in Physical Education. Philadel- 
phia: W. B. Saunders Co., 1958. (Summary statement about 
factors from factor analysis studies on page 116.) 

McCloy, C. H. “An Analytical Study of the Stunt Type Test 
as а Measure of Motor Educability.” Research Quarterly, 
VIII (1937), 46-55. (Describes a number of stunt type tests, 
nearly all taken from or adapted from the Brace battery.) 

McCloy, C. H. “A Preliminary Study of Factors in Motor Edu- 
cability." Research Quarterly, XI (1940), 28-39. . T 

MeCloy, C. H. ""The Factor Analysis as a Research Technique. 
Research Quarterly, XII (1941), 22-23. , 

McCloy, С. H. and Young, N. D. Tests and Measurements 1n 
Health and Physical Education. New York: Appleton Cen- 
tury-Crofts, 1954. f 

McCloy, E. “Factor Analysis Methods in the Measurement 0 
Physical Abilities.” Supplement, Research Quarterly, 
(1935), 114-122. 

McCraw, L. W. and Tolbert, J. W. “A Comparison of the Re- 
liabilities of Methods of Scoring Tests of Physical Ability: 
Research Quarterly, XXIII (1952), 73-81. TI 

Phillips, B. E. “The JCR Test.” Research Quarterly, XVI 
(1947), 12-29. (Includes intercorrelations for vertical ju 
WR and dodging run for 168 entering West Рош 

adets. 

Rasch, P. J. “Relationship of Arm Strength, Weight and Length 
to Speed of Arm Movement.” Research Quarterly, d 
(1954), 328-332. (Demonstrates lack of correlation betwee 
arm strength and speed of arm movements.) is 

Sargent, L. W. “Some Observations on the Sargent Test of Ne 
romuscular Efficiency.” American Physical Education Re 
view, February, 1924. h 

Scott, M. Gladys, “Measurement of Kinesthesis.” Resear? 
Quarterly, XXVI (1955), 324-341. e 

Shaffer, С. K. “Variables Affecting Krauss-Weber Капи! 
among Junior High School Girls.” Research Quarterly, 
(1959), 75-86. 


E . 


NICKS AND FLEISHMAN 95 


ia, D. A. and Karpovitch, P. V. "The Harvard Step Test 
s easure of Endurance in Running.” Research Quarterly, 

II (1951), 381-384. 

son, M. E. “A Study of Reliabilities of Selected Gross 
ar Co-ordination Test Items.” Air Training Com- 
Lackland Air Force Base, Texas, Human Resources 
ch Center, Research Bulletin 52-29, 1952. 
n, M. E., Thompson, J. P. and Dusek, E. R. “Tests of 
Muscular Coordination for Use in Selection of Per- 
.” Final Report, Contract AMC No. AF 83 (038)-22948, 
Research and Development Command, Lackland Air 
orce Base, Texas, 1952. 
n, M. E., Thompson, J. P., and Dusek, E. В. “Tests of 
Ability or Gross Muscular Coordination.” Air Re- 
and Development Command, Lackland Air Force 
Texas, Human Resources Research Center, Research 
tin, 59-25, 1953. Д 
stone, L. L. Multiple Factor Analysis. Chicago: University 
Chicago Press, 1947. $ 
W. W., Janney, D. D., and Salzano, J. V. “Relation of 
aximum Back and Leg Strength to Back and Leg Strength 
ndurance." Research Quarterly, XXVI (1955), 96—106. 
ller, A. G. “Analytical Study of Strength Tests Using the 
— Universal Dynamometer." Supplement, Research Quarterly, 

1 (1935), 81-85. 

18, Olive С. “A Study of Kinesthesis in Relation to Selected 
ovements." Research Quarterly, XVI (1945), 277-287. (19 
— kinesthesis tests plus general motor ability batteries given to 
| 37 women and the intercorrelations are presented.) : 
7: 


| EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor, XXII, No. 1, 1962 


A FACTOR ANALYSIS OF THE KUDER PREFERENCE 
ы RECORD—OCCUPATIONAL, FORM D! 


RICHARD E. SCHUTZ AN» ROBERT L. BAKER 
Arizona State University 


Тнв Kuder Preference Record—Occupational, Form D (Kuder, 
1956) is a recent entry in the field of interest measurement. Com- 
Posed of 100 triadic items, the instrument is scored using empiri-- 
| cally-based keys for specific occupations. At the time of this study, 

: E published keys were available. 

P The Occupational has a number of attractive features. The direc- 
Й tions are lucid and straightforward; the vocabulary difficulty is con- 
trolled; it requires only about 30 minutes to complete; and it is 
Machine scorable. One practical limitation in using the Occupa- 
tional In à counseling situation is the problem of selecting appro- 
Pate keys for scoring and then organizing the results for interpre- 
tation to the client, Each of the 42 scales is scored with a separate 
"| and no published profile form is available. The keys are simply 
ed in the chronological order of development. 
The purpose of this study was to determine the underlying factor 
Tucture of oceupational interests as measured by the Form D. A 
EL of this structure is of theoretical interest and is also 
in categorizing the keys to facilitate selection and interpre- 


tation of. the scales, 
Method 


та Occupational was administered to all freshmen students en- 
E Arizona State University in fall, 1959. A representative sam- 


"Th 
а іч of this study were presented at the annual meeting of the 
Sychological Association, New York, N. Y., September 1, 1961. 


97 


98 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ple of 450 males, having Verification raw scores of 49 and above, 
were used in the present study. The answer sheets were machine 
scored and the 42 scores for each examinee were punched into IBM 
cards, 

A product-moment intercorrelation matrix was prepared and a 
principal components analysis was performed.2 Components with 
eigenvalues greater than unity were rotated to simple structure 
using normalized Varimax procedures. All statistical computations 
were performed using an IBM 709 computer. 


Results 


The analysis yielded eight rotated factors.‘ However the eighth 
factor has only two variables with loadings above .30 (Job printer 
76 and Veterinarian .46) and is not easily interpreted. Tables 1-7 
present the scales with loadings above .30 on each of the other seven 
factors. 

Factor 1 has been labeled Interpersonal-Directive. Each of the 
Occupations represented here involves an interpersonal relationship 
of some sort. Moreover, the relationship is one in which an authority 
figure attempts to manipulate, direct, or otherwise control the be- 
havior of other individuals, The factor appears similar to Group V 
of the Strong Vocational Interest Blank (YMCA Physical Director, 
Personnel Manager, Public Administrator, Vocational Counselor, 
YMCA Secretary, Social Science Teacher, School Superintendent, 
Minister). (Strong, 1943) 

_, Although the highest loadings on Factor 2 relate to engineering, 
it seemed reasonable to designate the factor as Engineering-Physic#! 


—— 
2 Several items in the Occupati : ; e key, 
spuriously insuring an wm e EUM are included in more than one m 


On scores scored with and without item overlap., | 


rough the courtesy of Western Data Processing 
Center, Graduate School of Business Administration, University of Californi 


Р. Anderson, Program Consultant 


Service, Library of Co Washi tief, hoto- 
prints, or $125 for 35 Timis ls DP Daging $135 for T 


SCHUTZ AND BAKER 


TABLE 1 
Factor 1: Interpersonal-Directive 


Scale No. Loading Occupation 
14 .87 Personnel Manager 
32 .85 High School Counselor 
10 ‚84 YMCA Secretary 
9 .76 Psychologist, Industrial 
41 74 Pharmaceutical Salesman 
19 ‚74 Psychologist, Counseling 
11 72 School Superintendent 
8 .63 Psychologist, Clinical 
30 ‚62 Radio Station Manager 
5 .61 Minister 
22 .61 Lawyer 
29 .56 Psychiatrist 
4 —.54 Forester 
24 47 Insurance Agent 
27 AT Industrial Engineer 
23 46 Retail Clothier 
16 .46 Psychologist, Professor 
15 .43 Department Store Salesman 
42 .39 Male Librarian 
37 .94 Druggist 
3 —.34 Farmer 


- Scientist to reflect all of the loadings. The factor closely parallels 

| | pine Group II, Engineering and Physical Sciences (Physicist, 
emist, Mathematician, Engineer). 

Factor 3 appears quite homogeneous behaviorally. It is labeled 


TABLE 2 
Factor 2: Engineering-Physical Science 


Loading Occupation 

чы .84 Mining and Metallurgical Engineer 
18 .82 Civil Engineer 

2 .82 Mechanical Engineer 
27 ‚79 Electrical Engineer 
13 .72 Industrial Engineer 
35 .66 Meteorologist 
15 .63 Chemist 
34 —.55 Department Store Salesman 
12 54 High School Mathematics Teacher 
33 48 Accountant 
16 45 High School Science Teacher 


& 


Eh o ^ СИИ 


100 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 3 
Factor 8: Business-Detail 


Scale No. Loading Occupation 

eee — __ O 

40 41 Bank Cashier 

23 .62 Retail Clothier 

12 .58 Accountant 

24 .54 Insurance Agent 

37 .36 Druggist 

30 .88 Radio Station Manager 


Business-Detail, the same designation given to the Strong Occupa- 
tional Group VIII. The two factors are directly parallel. 


TABLE 4 
Factor 4: Business-A esthetic 


Scale No. Loading Occupation 
31 -79 Interior Decorator 
21 т Architect 
3 —.65 Farmer 
30 48 Radio Station Manager 
15 .48 Department Store Salesman 
34 aer High School Mathematics Teacher 
42 .36 Male Librarian 
8 32 Psychologist, Clinical 
E B Retail Clothier 


Factor 4 likewise appears to involve business activity. However, | 


the activity is in a rather different context from Factor 3. The 
TABLE 5 
Scale No, Loading Occupation 
42 -83 Newspaper Editor 
22 -64 Male Librarian 
16 59 Lawyer 
39 42 Psychologist, Professor 
29 — AL X-Ray Technician 
Б .38 Psychiatrist 
8 88 Minister 
19 :35 Psychologist, Clinical 
28 -35 Psychologist, Counseling 
30 i^ Pediatrician 


SCHUTZ AND BAKER 101 


highly loaded occupational seales here emphasize order, beauty, and 
fom; thus the title Business-Aesthetic. This factor has no parallel 
inthe Strong group structure. 

Factor 5 has been labeled Verbal-Directive. The pattern is quite 
similar to the Strong Group X, Verbal or Linguistic. However, the 
present contributing scales and their loadings suggest that the di- 
mension also involves a common element of authority or ascendency. 


TABLE 6 
Factor 6: Outdoor 
O —— __ "MEME 
| Scale No. Loading Occupation 
1 Ni County Agricultural Agent 
4 .61 Forester 
3 .97 Farmer 
26 .36 Veterinarian 
12 —.83 Accountant 


gp o — 5 0 Assis ШАЛАДЫ 
| Factor 6 is quite clearly an “Outdoor” factor. The factor has no 
| Parallel in the Strong group structure. Two of the occupations are 
тендей in Group IV, Technical and/or Skilled Trades, but here 

the similarity ends since the present factor is more homogeneous 
With respect to the behavioral aspects of the occupations involved. 


TABLE 7 
Factor ?: Health Scientist 
Seale No Loading Occupation 
| n En Dentist 
Е ‘81 Physician 
| = 79 Pediatrician 
| 39 .74 High School Science Teacher 
37 .72 X-Ray Technician 
26 66 Druggist 
35 64 Veterinarian 
29 58 Chemist 
i <56 Psychiatrist 
\ 13 .53 Psychologist, Professor 


AO Meteorologist 
Е 47 High School Mathematics Teacher 
» А1 Psychologist, Clinical ] 
5 .95 Mining and Metallurgical Engineer 
32 .95 Minister 
19 .34 High School Counselor 
1 


102 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 


The occupations which load Factor 7 are also quite homogeneous | 
The factor has been designated Health Scientist, The factor has no | 


direct parallel in the Strong groups but does have some overlap 
with Group I, Biological Sciences. 


Discussion i 


The factors appear to reflect seven readily interpretable dimen- 
sions which are tapped by the Occupational. While the over-all 
structure bears a resemblance to that found for the Strong Voca- 
tional Interest Blank for Men, several of the Occupational factors 
appear to be more homogeneous behaviorally than the Strong groups. 

The implications of the findings for practical test usage seem 
clear. If one wishes to obtain an efficient cross section of an ex- 
aminee's occupational interests, it would appear efficient to score 
the two or three scales which contribute the greatest variance b 
each of the seven factors. This procedure would ease the adminis- 
trative problem, simplify the profile, and make interpretation more 
meaningful to the examinee. А 

When one wishes to score all the 42 seales, the findings provide 
а basis for categorizing the scores for reporting purposes. The class 
fication suggested in Table 8 assigns each occupational scale to one 
of seven clusters. In general, each of the scales has been assigned t0 
the factor on which it has the highest loading. However, the pattem 
of loadings for each scale was also taken into consideration for 
factorially complex scales, Thus the clusters are not completely pure 
factorially. The scales have been arranged within clusters as closely 
as possible in order of their factorial contribution. Scales which hav? 
loadings greater than .40 on more than one factor include the se 
ondary factor number in parentheses following the scale title. 


Summary 


The 42 raw scores obtained on the Kuder Occupational, Form T | 
by 450 college freshmen males were used as the basis for a princip! 
components analysis, The analysis yielded seven readily interpre 
table rotated factors: I, Interpersonal-Direetive, II. Engineer? 
Physieal Science, IIT, Business-Detail, IV. Business-Aesthetic, " 


Verbal-Directive, VI, Outdoor, VIT. Health Scientist. The impli” 


tions of the findings for Practical test usage are discussed. 


SCHUTZ AND BAKER 


TABLE 8 
Suggested Clusters of Occupational Scales 


personal-Directive 

14* Personnel Manager 

32 High School Counselor 

10 YMCA Secretary 

41 Pharmaceutical Salesman 

11 School Superintendent 
Minister 


9 Psychologist—Industrial 
19 Psychologist—Counseling 
8  Psychologist—Clinical (VII) 

16 Psychologist—Professor (II, V, VII) 
ing-Physical Science А 

36 Mining and Metallurgical Engineer 
Mechanical Engineer 
Civil Engineer 
Industrial Engineer (I) 
Meteorologist (VII) 
Chemist (VII) 
34 High School Mathematics Teacher (IV, VII) 
33 High School Science Teacher (VII) 
Detail j 
40 Bank Cashier 
12 Accountant (II) 
24 Insurance Agent (I) 
23 Retail Clothier (I) 
38 Job Printer 
s-Aesthetic 
31 Interior Decorator 
- 21 Architect 
15 Department Store Salesman (I) 
30 Radio Station Manager (I) 
42 Male Librarian 

ective 
Newspaper Editor 
Lawyer (I) 
Journalist 


County Agricultural Agent, 
Forester 


ER 
_ 29. Payohiatrist (D) 


3 eterinarian (VIII) 
39 X-Ray Technician (V) 
37 Druggist 


10 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
REFERENCES 


Kassebaum, С. G., Couch, A. S., and Slater, P. E. “The Factorial 
Dimensions of the MMPI." Journal of Consulting Psychology, 
XXIII (1959), 226-230. 

Kuder, G. Frederic. Kuder Preference Record—Occupational, Form 
D. Chicago: Science Research Associates, 1956. 

Strong, E. K., Jr. Vocational Interests of Men and Women. Palo 
Alto: Stanford University Press, 1943. 


Well ( 


| 


EovcATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vot, XXII, No. 1, 1962 


MATHEMATICAL ABILITY AS RELATED TO REASONING 


AND USE OF SYMBOLS 


SISTER M. CANISIA, CS.F.N2 
Loyola Psychometric Laboratory 


Тнв purpose of this study was to make a contribution to the bet- 
ter understanding of the psychological nature of the factors that 
seem to enter into mathematical ability. No attempt was made to 
define mathematical ability except to regard mathematics itself 
48 à way of thinking, 

Mathematical thinking does not appear to be essentially different 
from any other kind of thinking, but it does have characteristic 
Qualities of its own. In this study, therefore, emphasis was placed 
on the identification of processes involved in mathematical think- 


нын those which appear to Ье distinetively characteristic 
it. 


Survey of Literature 


ES factorial studies of mathematical ability were mainly con- 
m With the question of whether or not there existed for mathe- 
r.8 a group factor over and above G, the general intellective fac- 
, Pearman (1927, 1951), Collar (1920), Fouracre (1926), Black- 
Eu 20), and Oldham (1937) claimed that there was no such 
1958) E Later studies (Barakat, 1951; Lee, 1955; Wrigley, 
tween m to an opposite view: although there is close connection be- 
.- mathematical and general ability (Wrigley) and innate in- 
Bence plays the largest role in mathematics attainment (Ba- 

1 
STE Was carried out at the Loyola Psychometric Laboratory, Chi- 
Writer wisheg under the direction of Dr. Horatio J. A. Rimoldi to whom the 


б to е А aR магт. п апа еп- 
COuragin, express her gratitude for the inspiration, guidance, 
"EE help so consistently extended throughout all the stages of the work. 


100 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


rakat), there does exist a mathematical group factor which links the 
different branches of mathematies more closely than would be the 
case if only a general factor were in operation. Feierabend (1959) 
writes: "The unanswered question remains as to whether all persons 
of sufficient general intelligence have equal potential for mathe 
matics, or whether there may not exist some special abilities, factors, 
or conceptual approaches which are specific to the field of mathe- 
matics or perhaps to creativity in mathematics thinking.” 

From his numerous researches, Guilford (1956) reports the identi- 
fication of about forty different, factors in the complex structure of 
the human intellect and suggests that, the picture is still incomplete 

The number factor has generally been associated with computa- 
tional ability. Coleman (1956) found computational ability to be 
distinct from and only remotely related to mathematical ability. 
Weber (1954) claims that there is no connection between the num 
ber factor and mathematical talent. In discussing the factors isolated 
in his classic studies of intelligence, Thurstone (1942) pointed out 
that the psychological nature of the number factor was not as cleat 
as that of the other primaries and that perhaps “number” as such 
was not an adequate description of the factor. Invesigations of the 
number factor by Coombs (1941) and Werdelin (1958) indicate that 
the factor is more basic and general than number. 

Mathematical tests used in previous studies of mathematical abil- 
ity have been almost exclusively in the nature of achievement test 
measuring scholastic skills, particularly routine mastery of computa: 
tional skills. Factors found in factorial studies based on such tests 
appear to be unduly weighted by scholastic variables. 


Preliminary Study 

In order to crystallize the problem and to help formulate the 
hypotheses to be advanced in the main study, the writer perform? 
а preliminary factorial analysis of a sub-battery of fourteen of the 
fifty-seven psychological tests that were used by Thurstone (1938). 
The tests selected were chosen either because they involved mathe 
matical content or because they could serve as reference variable 
Eight centroid factors were extracted and rotated to simple stru% 
ture. The mathematical tests were found on five of the factors: DUM 
ber, abstraction, and three factors of a reasoning nature. of 


mathematical ability appeared to be a complex combination 
several distinct, mental processes. 


SISTER M. CANISIA, CSEN. 107 
The Main Study 


Hypotheses For The Study. An integration of the implications drawn 
from the findings of previous research, including the preliminary 
study, and from the writings of mathematicians, psychologists, and 
educators has led to the development of the following hypotheses: 


1. Mathematical ability involves essentially: 

a) Ability or abilities to see or discover relations, realize 
their implications, and make inferences from them. 

b) Ability to educe correlates, to extract from given data 
facts not explicitly stated. 

c) A fluency in the manipulation of certain symbols; an abil- 
ity to handle abstract qualities without concrete aids. 

d) Ability to analyze a situation, distinguish relevant from 
irrelevant data, and organize a sequence of steps leading 
to a solution. This ability may be considered a function of 
the abilities assumed in a, b, and c above. 

2. The number factor commonly associated with mathematical 
ability is not limited to purely computational aspeets. Com- 
putational facility may or may not be an important element 
in mathematical ability. 

- Other factors enter into mathematical ability in varying de- 
&rees depending upon the context of the problem or the nature 
of the task, \ 

4. To get at the basie, underlying factors that determine the 
dynamics of mathematical thinking, the tests used in the 
study ought to be of such a nature as to depend little, if at all, 
9n formal training in the partieular subjeet branches. 


- General Description of the Variables. A battery of thirty-six tests 
Was assembled. Ten of the tests were standardized tests. All the 
other tests were constructed by the writer. Of these, twelve were 
Patterned after tests used by other investigators in previous studies, 
= А р = were original tests developed especially for this study. 
s 19, 20, 21, and 22 were reference tests for the number fac- 
big eh 2, 8, 10, 11, 12, 17, and 18 were reference tests for reason- 
and rs. Tests 4 and 5 were reference tests for the verbal factors 
test 3 for the space factor. Tests 6 and 7 were included as ex- 

al criteria of mathematical ability. The eighteen experimental 
TEN ш completed the battery were designed to explore hypotheses 

* nature of mathematical ability. 


108 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Descriptions of the Tests? 


Tests 1 to 5 inclusive are the SRA Thurstone tests of the Primary 
Mental Abilities: Number, Reasoning, Space, Verbal-Meaning, and 
Word-Fluency, respectively. 

Tests 6 and ? are the California Tests of Mathematics, Advanced 
Form, Reasoning and Fundamentals, respectively. 

Tests 8 to 10 inclusive are the Reasoning Tests from the Holzinger- 
Crowder Uni-Factor Tests: Mixed Series, Figure Changes, and 
Teams. 

Test 11: Number Series is a free-answer test patterned after num- 
ber series tests in which the next term is to be supplied. 

Test 12: Number Series 2 is also a free-answer number series test. 
The number that “does not belong” is to be indicated. Р 

Test 18: Statement to Symbol Translation consists of items in 
which the problem is to choose, from five given alternatives, the one 
that correctly translates the verbal statement into an algebraic ex- 
pression. 


Test 14: Functional Relationship consists of items involving di- 
rect and inverse variation. 

Test 15: Problem Analysis I consists of simple word problems. 
The task is to merely indicate the operations necessary to solve the 
problem. 

Test 16: Problem Analysis IT. The test item is a question followed 
by four statements. The task is to indicate the statement that gives 
information irrelevant, to the solution of the problem stated in the 
question, 


Test 17: Figure Grouping is an adaptation of Thurstone’s figure 
classification test. 


Test 18: Figure Matriz is essentially an adaptation of the Rave? 
Progressive Matrices. 

Test 19: Addition of two, three, 
numbers. 


Test 20: Subtraction of one 
bers. 


and four single-digit and two digit 
-digit, two-digit, and three-digit num- 


Test 21: Multiplication of one, two, three and four-digit numbers 
by a one-digit number. 


deir us of two-digit and three-digit numbers by 016° 


? Table 1 lists number of items, time limit, and scoring formula for each test: 


SISTER М. CANISIA, CSF.N. 109 


digit number. When the division is not exact only the remainder is 
to be indicated. 

Test 23: Conditions 1 is a test of abstract logical reasoning, A con- 
clusion is judged true or false on the basis of a set of given condi- 
tions, 

Test 24: Conditions 2 is another test of abstract logical reasoning. 
The task is to indicate the relation (>, =, or <) which would ex- 
press a true conclusion under a set of given conditions. 

Test 25: Fluency with Mathematical Expressions is a free-answer 
test. Sample item: In how many different ways can you write that 
3 times the sum of c and d is to be divided by m? 

Test 26: Quantitative Relationship. The task is to determine the 
relationship between two given expressions and to place the proper 
mathematical sign (>, =, or<) between them. 

Test 27: Numerical Inequalities consists of items involving funda- 
mental number-operations with inequalities. 

i Test 28: Algebraic Inequalities is similar to Numerical Inequali- 
ties except that numbers are replaced by letters making the opera- 
tions more abstract, 

Test 29: General Expressions. The test requires the student to dis- 
cover the relations between numbers in a sequence and to write the 
Seneral expression for that relationship. 

Test 30: Number Oddities is essentially a series type test. Sample 
Item: Observe that 

1x8+1=9 


12x 8 + 2 = 98 
123 X 8 + 3 = 987 
1234 X 8 + 4 = 9876 
Now write the next two lines 


1 Test 81: Number Relations is a test of fluency and originality in 
andling number relations, 

Tu 52: Number Fluency is patterned after Thurstone's Word- 
in cy. The Student is required to write quickly as many numbers 
€ can think of that satisfy certain given conditions. 

23 leg Formulas and Figures consists of items that require the 
es match an algebraic expression with a geometric figure. 
aah Р M ized Operations is a simple number-operations test. 

m consists of a heterogeneous series of number operations. 


110 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Test 35: Missing Number is a simple number-operations test. The 
test item is an incomplete equation which the student completes by 
supplying the missing number. 

Test 36: Missing Sign is a companion test to Missing Number. 
The task is to supply the mathematical sign that will make an in- 
complete equation true. 


TABLE 1 
Description of Test Battery 


Number Time Limit Scoring 


Number Name of Test of Items (minutes) Formula 
1 PMA Number 70 6 P 
2 РМА Reasoning 30 6 d 
3 PMA Space 20 5 P» 
4 PMA Verbal-Meaning 50 4 x 
5 PMA Word-Fluency — 5 R 
6 CMT Reasoning 60 30 B 
7 CMT Fundamentals 80 38 S 
8 Міхед Series 40 T " 

9 Figure Changes 40 Т =; 
10 Teams 30 6 RY 
n Number Series ; 20 10 E 
12 Number Series (2) 20 15 - 
13 Statement Translation 20 10 B 
14 Functional Relationship 30 20 = 
15 Problem Analysis I 20 10 8 
16 Problem Analysis II 20 10 
17 Figure Grouping 30 15 
18 Figure Matrix 24 20 E 
19 Addition 70 6 p 
20 Subtraction А 80 б E 
21 Multiplication 70 12 r 
22 Division 60 6 Е 
23 Conditions 1 30 15 А 
24 Conditions 2 30 20 b 
25 Fluency With Mathematical 

Expressions 8 16* 1 
26 Quantitative Relationship 25 10 Е 
27 Numerical Inequalities 20 10 E: 
28 Algebraic Inequalities 20 10 r 
29 General Expressions 20 20 5 
30 Number Oddities 20 30 X 
31 Number Relations 10 20* K 
32 Number Fluency 10 20* Е 
33 Formulas and Figures 20 15 R 
34 Mixed Operations 20 6 Л 
35 Missing Number 20 10 z 
36 Missing Sign 20 10 E 


* Two minutes per item 


TABLE 2 
Computations for Each of the Tests in the Main Study (N = 150) 


= 


Boso NDA ON owi no 


EBSSq5EEEREUS PRREEREE 


NN E жже ж o ot ооо 


БЕБЕ ЕЕЕ ЕК АЕ 


SISTER M. CANISIA, СЗЕМ. 
Procedure and Results ! 

battery of thirty-six tests was administered to 160 stu- "^ 
the eleventh grade of a private secondary school for girls. — — 


e data were obtained for 150 subjects and these constituted "n 
mental group. Descriptive statistics for each test (Table | 


Фф 
L! 


‘tus 


Ta 


[ 


As 
ш 


.81 


SEE BUBETSSRENI 


HEELFEEPEL. 


112 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


2), as well as Pearson product-moment inter-test correlations of 
normalized scores were computed. 

Twelve centroid factors were extracted from the thirty-six vari- 
able correlation matrix by means of a high-speed digital computer, 
the ае. First communality estimates were computed by Thurs- 
tone’s centroid formula. At later stages of the factoring the highest 
entries in the array were used as new communality estimates. The 
decision to stop at twelve factors was made after it became evident 
that any additional factors would make little contribution to the 
total variance. Residuals ranged from .090 to —.080 with a mean 
of .018. 

Initial rotation to simple structure was made by Oblimax, the 
program by which the Illiae is said to rotate automatically to 
simple structure. It was necessary, however, to make adjustments 
graphieally. Twenty additional rotations were taken. The oblique 
factor matrix is given in Table 3. 

The interpretation of each factor is based primarily on the tests 
with high loadings, that із, .30 or higher, but occasionally tests with 
lower loadings are used to give supplementary evidence. 


Factor A 
Test Loading 
22 Division .43 
19 Addition .38 
34 Mixed Operations .37 
1 РМА Number .36 
12 Number Series 2 .35 
20 Subtraction .34 


All the tests on this factor deal with numbers. Division, Addi- 
tion, and Subtraction have been classically considered as tests loaded 
in the number factor. PMA Number is also present here. It seems 
reasonable to think that this factor represents what has been know? 
as the number factor described by Thurstone (1947). 

Mixed Operations and Number Series both involve simple nU* 
merical work. In Mixed Operations the four fundamental ope? 
tions are performed in a heterogeneous manner. In Number Series 


3 The correlation matrix as well as other d ой 

> A À tables for the study are aep 
with the ADI Auxiliary Publications Project. Order Document No. у 
mitting $125 for 35 mm. microfilm, or $1.25 for photoprints to Chief, Phe 
duplication Service, Library of Congress, Washington 25, D. C. 


eee 


SISTER М. CANISIA, CS.F.N. 113 


it is necessary to discover the number that “does not belong." The 
calculations involved are mainly simple addition and subtraction. 

It may be noted that Multiplication has no loading on factor A 
and that all the tests on this factor, except for Addition, have load- 
ings on other factors as well. 


TABLE 3 
The Oblique Factor Matria* 


Eu A B OD E тан 


1 36 01 05-02 08 —06 —01 14 16 20 00 24 1.00 
2 10 44 —03 —05 —06 —08 06 04-09 30 05 20 76 
3 -02 39—15 02 00 —08—01 22 —03 -02 05 02 43 
1 (02 02 01-02 00 Oli —Ol 41 OL O7 00 08 45 
5 = —04 —08 -01 01 01 18 12 02 05—02 32 31 
° li з 04 20 08 42 —14 —02 12 6 017 < 
7 04 18 15 00-01 40 12-01 22 03 03 —03 83 
8 -06 08 08 оз 02 00 —07 —01 33 33 43 06 68 
^ -02 31 —02 04 00 00 00 10 16 57 02 —10 63 
"d -10 -01 00 35 25 —03 00 00 40 02 —10 01 56 
p if 01 17 18 20-07 20 12-01-02 13 —01 45 
18 35 23 —02 33 32 03—02 21-06 14 01—36 77 
2 10 05—02 32 —03 23 07 03 —03 —02 —09 05 69 
6 05 01—05 02 00 48 18 —04-01 34 —01 06 67 
iœ 708 —08 05 16 05 42 –01 10 17 00 06 —06 63 
i 06 07 —05 26 17 —02 19 09 00-24 01 00 45 
i 705 08 34 07 02-01 06 15 07 —09 —03 —Ol 36 
19 99 32 04-001 13 00 09 15 -07 -05 32—10 51 
x% 29-08 05 п 05 о-о 07 12 16 05 =07 46 
ә 2 02 03 -05 13 22 —06 –01 —08 -06 22 00 48 
a 0% 01 02 оз 00 02 06 OL 43 —08 11 —02 58 
з 45 01-02 23 02 02 01-08 14 01 02—07 5 
"i ss -04 10 34 15 05 13 O01 14-08 03 08 62 
a or 08 —24 42 зо —03 03 04 08 02 01 04 68 
28 16 01—02 30 —43 15 –12 —05 00 05 —02 —01 44 
т 05 9 o 31 n 21 00 12 06 08 05 —05 70 
28 "4 702 26 -04 36 05 02 01 10 08 03 —02 35 
0 о -03 09 —04 03 33 53 00-09 16 —01 03 66 
30 "d 03 оз 28-02 03 12 27-08 05 13 O01 62 
31 , 95 00 00 00 03 01 —02 06 02—05 02 75 
3 02 16-17 23 27 17-03 01 05-0 0-0 61 
3 o 0? 07 02 28 02 00-01-02 01—01 43 65 
%4 y 01 —06 03-02 05 46 10 06 00 46—01 52 
% (м ae 2 17 -05 —05 04 27 —07 —05 = za x 
01 10 Е =01 06 — 
36 03 04 —-04 29 -0 20-06 55 


* Decimal 
wi tino im aad e" ө ею omitted for all entries 


14 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Factor B 


Test Loading 
30 Number Oddities .58 
2 PMA Reasoning 44 
3 PMA Space 39 
18 Figure Matrix .92 
9 Figure Changes 31 
12 Number Series 2 .23 
35 Missing Number .23 
36 Missing Sign .21 


The factor clearly transcends the material of the tests and is not 
limited to one particular type. Number Oddities, Number Series, 
Missing Number and Missing Sign deal with numbers. PMA Ret- 
soning uses letters as do some of the items in Number Oddities. 
Figure Matrix, Figure Changes, and PMA Space involve forms and 
figures. 

A study of the mental processes that are involved in doing these 
tests indicates that they seem to require the ability to handle sev- 
eral more or less conflicting Gestalts either simultaneously or in 
suecession. In Number Oddities it is necessary to recognize a pattem 
within a pattern and, as in PMA Reasoning, discover the principle on 
which the pattern is built. In PMA Space and in Figure Changes 
the subject, must keep in mind a given figure or relation while ex 
amining several alternates Which must be compared with the give 
figure. A similar activity is required in Figure Matrix. 

Factor B might be identified with ‘Thurstone’s (1944) Factor E 
(Flexibility of Closure), which he described as an ability that facil 
tates the retention of a figure in a distracting field. Flexibility of 
closure factors identified in other studies (Adkins, 1952; Botaum, 
1951; Pemberton, 1952) were defined by such tests as Figure Class! 
fication, Identical Forms, Gottschaldt Figures, and similar tests 0 
а perceptual nature, In this study, PMA Space, Figure Matrix and 
Possibly Figure Changes could come under that classification. А 
three tests have high loadings on factor B. 

It appears that the Flexibility of Closure factor can be extende 
beyond the perceptual domain, Pemberton’s (1952) study W% 
thought to yield strong evidence that flexibility of closure is 8880" 
ciated with analytical reasoning. In the present study, Number Odd 
ties and PMA Reasoning are series-type tests in which it is neces 
to manipulate the parts Separately and yet maintain a clear picture? 
the whole configuration. Both tests have high loadings on factor ^' 


Ss = - = 
ق‎ ———————___  ___ «| 


SISTER M. CANISIA, C.S.E.N. 115 


PMA Reasoning is similar to Number Series 1. It may be interesting 
to note that Number Series 1 has no significant loading on factor B, 
while Number Series 2 has at least a small loading on the factor, 
Solution of items in Missing Number and Missing Sign appears 
to be greatly facilitated by the ability to see the equation as a 
whole and move signs and numbers about mentally until the equal- 
ity is satisfied. In the easier items not much insight is needed; they 
can readily be solved by inspection. The more difficult items, how- 


ever, require a synthetic activity combined with analytical pro- 
cesses 


Factor C 
Test Loading 
17 Figure Grouping -34 
35 Missing Number -29 
27 Numerical Inequalities -26 
34 Mixed Operations 23 
24 Conditions 2 —.24 


The interpretation of this factor can only be very tentative be- 
eause of the small number of tests defining it and their low loadings. 
Figure Grouping is a classification test. The items are easy and 
Probably require little more than perception of the common prop: 
erty that distinguishes three of the figures from a fourth one which 
is different, Y% 
The nature of the tests Missing Number, Numerical Inequalities, 
and Mixed Operations suggests an element of flexibility of opera- 
tion. In each of these tests it is necessary to shift from ono 
tion to another and handle almost simultaneously addition, sub- 
traction, multiplication, and division. Missing Number has а small 
loading on factor B which was identified with flexibility of 
Closure, 
It is interesting to note that Conditions 2, a difficult reasoning 
test of an abstract nature, has a negative loading on factor С. 
t is probably best to consider factor C a residual factor; it has 
One of the highest second-order loadings. 


Factor D 


Test Loading 
24 Conditions 2 42 
10 Teams 35 
23 Conditions 1 34 
12 Number Series 2 33 
13 Statement Translation .82 
26 titative Relationship .31 


р 


116 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Factor D (Continued) 
est Loading 
25 Fluency with Math. Expressions 30 
29 General Expressions 28 
16 Problem Analysis 2 26 
31 Number Relations 23 
22 Division 23 
6 CMT Reasoning 20 


Conditions 1 and Conditions 2 were constructed as abstract, 
mathematical counterparts of Teams, a test of syllogistic reason- 
ing. These three tests head the list on factor D. In Teams and in 
Conditions 1 the relation is stated and the subject must judge it 
true or false under the given premises. In Conditions 2 the relations 
are not given but must be educed by the subject, | 

Number Series 2 requires the discovery of the principle on whieh 
the series is built and finding the number that does not belong. 

In Statement Translation and in Fluency with Mathematical 
Expressions, it is necessary to express in mathematical symbols à 
relation given in verbal form. : 

Quantitative Relationship was designed as a test for the relation 
of likeness in a mathematical context. Two members of an expres- 
sion are given and the subject must judge whether the first is equal 
to, greater than, or less than the second. 

General Expressions was considered by the students a difficult 
test. It requires the generalization of a principle and the eduction 
of an expression defining the relationship. ; 

Factor D seems to have some of the properties of formal logit— 
ordering, scaling, organizing. 

In all the tests defining factor D, it seems that two things WeI? 
necessary for the successful completion of the task: the ability 0 
manipulate simultaneously a number of relationships and the abil- 
ity to use symbols, both of which suggest that a certain amount 0 
flexibility is an important factor in mathematical reasoning рї“ 
cesses. The criterion test, CMT Reasoning, has a small (.20) load- 
ing on this factor which could be significant in terms of the vie 
expressed above, particularly if we observe that its companion 
criterion test, CMT Fundamentals, has a loading of .00 on the fat- 
tor. In CMT Reasoning the various items in each of the four pa! 
are very unlike. No fixed “set” could be maintained in working 827 
of the parts. Each of the four sections in CMT Fundamentals, 0? 


| | 


SISTER M. CANISIA, CSFN. 17 


the other hand, involves one specific type of operation only: addi- 
tion, subtraction, multiplication, or division. 


Factor E 
Test 


Loading 
27 Numerical Inequalities .36 
12 Number Series 2 .32 
24 Conditions 2 30 
32 Number Fluency 28 
31 Number Relations 27 
10 Teams 25 
11 Number Series 1 .20 
25 Fluency with Math. Expressions —.43 


In Numerical Inequalities and in Conditions 2 the task is to state 
“the relationship between the members of a given expression. The 
relation is essentially that of likeness: equal to, greater than, or 
1685 than. 
In Number Series 2 it is necessary to discover the number that 
disturbs the sequence—a number that is unlike the others in the 
TOW, 

Number F lueney calls for the eduction of numbers all of which 
Must satisfy a given condition, hence a common relation of like- 


— If the factor represents the ability to manipulate quantitative re- 
lonships, then the high negative loading of Flueney with Mathe- 
matical Expressions is not so strange. This test involves the handling 
? Operational relations and a versatility in translating from verbal 
“mathematical symbols, 

- Another interpretation for the factor may be that it represents 
* ability to work under limiting, restrictive conditions. It may 
sent the handicap under which much mathematical activity 
be performed—limitations and restrictions imposed by the 
theses, definitions, and postulates more or less arbitrarily set 


Factor F y 
Test Loading 


14 Functional Relationship 
CM oning 
15 Problem Analysis I 
7 CMT Fundamentals 
Algebraic Tnequalities 
Statement Translation 
20 Subtraction 


Quantitative Relationship 


4953979395 


118 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Factor F is clearly dependent on formal training in mathematics. 
Functional Relationship involves interpretation of direct and in- 
verse Variation. The criterion tests, CMT Reasoning and CMT 
Fundamentals, are standard achievement tests in mathematics. The 
simple word problems in Problem Analysis 1 are typical of test items 
on current Arithmetic Reasoning Tests. Algebraic Inequalities and 
Statement Translation, as well as Quantitative Relationship, re- 


quire skills regularly developed in routine classroom work. 

Factors defined by scholastic variables have been isolated in the 
first order by Comrey (1949), Holzinger and Swineford (1939), and 
Carroll (1943) , and in the second order by Weiss (1955). In all these | 
studies, however, tests and grades were used as variables. When- 
ever grades are used as variables, extraneous complex elements are 
introduced. The factors defined by a combination of test and grade 
variables have been described in terms of such personality charac- 
teristics as “ambition,” “will to succeed,” “will to learn." | 

The present factor F cannot be identified with those scholastie | 
factors where grades are the chief determining variables. | 


Factor G У 
Test Loading 
28 Algebraie Inequalities 53 | 
33 Formulas and Figures .46 
| 
I 


| 

‚ Factor G is only a doublet and its interpretation is uncertain. T 
is probably a factor unique for this study, caused by the nature of 
the two tests. Either the tests were too difficult or the tasks too u- 
familiar. At any rate, it seems that an element of uncertainty реч 
meated the performance of many subjects on these tests. At best 
the factor might be described as a “guessing” factor. It is E 
that the two tests so characterized should bring out a distinct E 
with so much sharpness. 


Missing Sign .28 
Mixed Operations 2 27 
General Expressions 27 
PMA Space ‚22 
Number Series 2 21 


ELELLA 


Factor H ы 

Test Loading 
PMA Verbal-Meaning „4l 
Missing Number .29 


SISTER M. CANISIA, C SEN. 119 


Test 4, PMA Verbal-Meaning, is one of the standard reference 
tests for the verbal factor. The other tests on factor H are all non- 
verbal. Their verbal content is confined to the instructions. The 
small, but probably not insignificant, loadings of these tests might 
be explained by the observation that many subjects found it easier 
lo solve the problems by verbalizing the operations that were to be 
performed. The name Verbalization thus seems more appropriate 
than the more commonly used Verbal-Meaning or Verbal Factor. 
The attempt to minimize the verbal factor in the battery was ap- 
parently successful. The verbally-presented tests, Statement Trans- 
lation, Problem Analysis I and II, have loadings of only .03, .10, 
and .09, respectively, on factor H. 


Factor I 
T'est Loading 
21 Multiplication .43 
10 Teams ‚40 
8 Mixed Series .33 
7 CMT Fundamentals .22 


All the tests on this factor deal with numbers exeept Teams, which 
is a test of nonquantitative, syllogistic reasoning. 

Multiplication has traditionally been known as one of the best 
reference tests for the number factor. Mixed Series obviously re- 
quires manipulation with numbers, CMT Fundamentals consists of 
four short, achievement tests based on problems in which the funda- 
mental operations are used with whole numbers, fractions, and deci- 
mals, 

Thus it seems that this factor is possibly another number factor 
and perhaps represents the ability to perform simple numerical 
Operations Where these are not presented as mere number manipula- 
tion. The high loading of Teams on this factor is difficult to interpret. 

Multiplication has an average correlation of .36 with the tests on 
factor A, but its loading on factor А is only .09. Factor I correlates 
"9 With factor A which indicates a strong relationship between the 


factors, but there are no overlapping tests; thus the two number 
Actors are quite distinct. 


Factor J 
Test 
9 Figure Changes 
14 Functional Relationship 
Mixed Series 
2 PMA Reasoning 


sarak 


у 


120 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Figure Changes, Mixed Series, and PMA Reasoning are tests 
of the type that Spearman considered to be good tests of “g.” Such 
tests, Thurstone observed, seem to be inductive in character. In this 
battery all three are factorially complex. 

Figure Changes and Functional Relationship are tests in which 
the relationship and part of the answer are given. The mental ac- 
tivity required to complete the item is essentially the eduction of a 
correlate. In Mixed Series and in PMA Reasoning the relationship 
is not given explicitly, but, once it is discovered, the response may 
be the eduction of a correlate—a letter or number that would 
continue the series. 


Factor K 
Test Loading 
33 Formulas and Figures .46 
8 Mixed Series .43 
18 Figure Matrix .32 
36 Missing Sign .26 
34 Mixed Operations .28 


Factor K is difficult to interpret with confidence because all the 
tests that define it are factorially complex. 

Mixed Series and Figure Matrix are inductive in character. In 
Formulas and Figures an algebraic expression is to be matched with 
а geometric figure—relationships are sought. The ability to educe 
abstract relations appears to be the main component of the mental 
processes that are probably involved in the solution of the tests 
represented on this factor, 


Factor L 
Test Loading 
32 Number Fluency 43 
5 PMA Word-Fluency .32 
1 PMA Number .24 
12 Number Series 2 —.36 


Factor L is obviously a fluency factor. Both tests, Number 
Fluency and PMA Word-Flueney, require the subject to produce 
responses quickly under a simple restricting condition. In Word 
Fluency the words have to begin with a given letter; in Number 
Fluency the numbers have to satisfy a given condition: even, odd, 
multiples of 3, ete. 

The PMA Word-Fluency test is sometimes considered to be ? 


SISTER M. CANISIA, CSFEN. 121 


Of some degree of creativity or ideational fluency. PMA Number 
lio has a small loading on this factor. It seems that the problems 
have been solved in a more or less automatic way because of 
arity with the material. 

T 8 test Number Fluency was developed for the purpose of dis- 
covering whether there would be any relation between word-fluency 
and number-fluency. Apparently there is and the link may well be 
& certain kind of reaction-time factor, the speed and ease with 
Which one characteristically reacts to a familiar stimulus. Thus 
fluency may bea personality characteristic. 

"Two other tests, Number Relations and Flueney with Mathe- 
matical Expressions, were designed to study the relation between 
word-fluency and number-fluency. The restrictive conditions, how- 
ver, being more rigid, these tests have vanishing loadings on factor 
i. It seems that the speed factor as such did not contribute sig- 


icantly to the total variance of the tests designed to study num- 
_ ber fluency. 


Second-Order Analysis 


4 Four Second-order factors were extracted by the multiple-group 
Method from the matrix of correlations between the twelve primary 
5 given in Table 4. Rotation to an oblique simple structure 

formed graphically. The rotated second-order factor matrix 
given in Table 5. 
| , Factor A’ is defined by the four primaries C, D, B, and L. It is 
* Quite genera] but has neither a verbal nor a numerical factor. 
i characteristic element appears to be flexibility in handling 
E perceptual relations and abstract mathematical symbols. 
actor B’, defined by the two number primaries A and I, is 
À E number factor. It may be significant that the primary D, 

ч almost exclusively by tests of a mathematical nature, has 
high Negative loading on this second-order number factor. 
EC Probably represents abstraction. Both primary factors 
3 » Which define it, require the ability to educe and manipu- 
"Telations of all kinds. If the interpretation of the primary G as 
EL Suessing factor is correct, then its high negative loading 
T Коч factor should not be surprising. 
The a; ent that factor D’ has much in common with factor A’. 

; erentiating feature i ; ; 
re is the presence of primaries H and F 


s 


122 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 
Correlations Between the Primary Factors 


a 00 

B .10 1.00 

С —.07 .40 1.00 

D —.21 .57 .48 1.00 

Boon И 13 1.00 

F .29 .36 01 37  .23 1.00 

G .25 .20 —.04 .15 —.11 .12 1.00 

H .97 .32 —.00  .20  .19 .63 .09 1.00 

I .66 .14 —.13 —.20 —.04 .26 .28  .43 1.00 

J —.18 .06 .14 16 .19 —.14 .00 —.09 —.23 1.00 
E 2s I TTA IB —32—.07 —.20 .05 1.00 o 
L I .51  .28  .43 20 40 —.02 .49 .27 .04 231. 


with significant loadings on factor D’ and the alternating roles of 
primaries L and B on the two second-order factors. It thus appears 
that factor D’ might represent the influence of scholastic variables 
on the activities defined by factor A’. 


Discussion of Results 


The aim of the present study was to explore the domain of 
mathematical ability in general and in particular to study the 
nature of factors that seem to enter into mathematical thinking. In 
the hope of measuring something more basic than acquired abili- 


TABLE 5 
Rotated Factorial Matriz for the Primaries in the Second-Order* 
——Є———Є—————————————“ 


А' MU с D’ 
КААМ T ры з. — — 
A 05 55 —04 04 
B 56 12 12 36 
С 65 03 09 03 
р 60 -34 —09 60 
E —09 -02 40 15 
F 04 08 18 51 
G 25 16 —34 09 
H 00 00 =i 60 
I 00 65 04 -02 
J 18 -01 16 -1 
K 07 -01 61 00 
L 31 10 17 50 


* Decimal pointa have been omitted for all entries 


SISTER M. CANISIA, CSEN. - 123 


ties, an effort was made to so select and develop the tests as to 
minimize, as much as possible, the effect of formal training in 
mathematics. A scholastic factor, however, did emerge both in the 
first and in the second order. 

A unique development of the study is the splitting of the number 
factor into two distinct, though correlated, factors one of which, 
factor A, was identified with what has commonly been called Thur- 
stone’s N. The other number factor, factor I, was more difficult to 
interpret. Its relation to the number factor N is found through the 
leading saturation of Multiplication on it and through its union 
with factor A on the second-order number factor В”. 

Factor A is best represented by addition and seems to define an 
ability that is almost on the borderline of reasoning, an automatic 
sort of manipulation of numbers requiring perhaps only rote mem- 
ory of various number combinations. Factor I is best represented 
by multiplication and shows some relation to deductive reasoning 
by the presence of such tests as Teams and Mixed Series. 

Thurstone (1947) may have anticipated an outcome of this kind 
When he stated that “a primary factor may be found to be itself a 
Hw When a part of a domain is investigated with large batter- 

s. 

Whether a factor is isolated in the first or in the second order de- 
pends upon the selection of the variables in the test battery and on 
ны of analysis, In the present study only eight of the 
ET lests were nonmathematieal. The mathematical (сев 
Am P nature. That the number factor has become separated in 
ети ipd is not во surprising as the finding that Multiplication 
Кы һауе little in common with the other tests. of funda- 
further c Perations, This curious behavior of multiplication merits 

Investigation, 
Ee that only the two number primaries define the second- 
ings in 5 er factor B’ and that both primaries have vanishing load- 
ability a other second-order factors, may mean that numerical 
i ma Dem SRM by the number factor, is not an important element 
aa ^ ical ability as here studied. This is not to be understood 
though Эк computational skills are unimportant. It appears that 
Verse js ematical ability presumes the number factor the con- 
not true. 


* nature of several factors strongly suggests that the mathe- 


124 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


matical processes appear to be mainly processes of eduction, or- 
ganization, and manipulation of relations. Factors B, D, E, and K, 
each in its own way, represent abilities that may, in the final anal- 
ysis, be reduced to a perception of relations and the use of this 
knowledge in the solution of problems. 

The above interpretation identifies mathematical thinking with 
the second principle of noegenesis. If the principles of noegenesis do 
form the basis of all intellectual activity, then mathematical ability 
as here studied becomes part of intelligence. This conclusion im 
itself is neither new nor surprising. The striking thing is that ap- 
parently it is the second principle of noegenesis, as distinguished 
from the third, that is the more closely related to mathematical 
ability. Factor J which was interpreted as eduction of correlates, the 
third principle of noegenesis, is practically orthogonal to the rest of 
the first-order factors and so uncorrelated with them. Its saturations 
on the second-order factors are all negligible. This finding, of course, 
may be only the effect of selection in the present study. Final con- 
firmation will have to come from further studies especially de- 
signed to investigate this proposition. Such studies may lead to 
valuable information about the structure of mathematical ability 
as related to general intelligence. 

Factor B was interpreted as flexibility of closure. The element of 
flexibility that seems to pervade most of the tests on that factor 
suggests that the more “Gestalt-free” an individual is the better his 
performance. This interpretation makes it possible to identify factor 
B with Rimoldi’s (1951) Factor A which he defined as a reasoning 
factor that stressed “the essentially dynamic character of the 
process, and mainly the plasticity required to perform such an 8€ 
tivity” in the complex situations presented in the tests. 

Perhaps this flexibility or plasticity of operation is related to the 
“many-sided nature of thought material” that Duncker (1945) 
discusses in relation to problem solving. One-sidedness, or pojverty 
of thought material—an inability to see more than one asp ist 
a time or to re-strueture a concept once formed—is there cons dered 
to be the chief distinguishing characteristic of poor thinking and at 
the same time of a limited kind of mathematical ability. 

The second-order analysis was motivated by an expecte uon af 
practically useful and fundamentally important results. T fo gone 
extent this expectation has been realized. It is in the seco! q ord 


SISTER M. CANISIA, CS.FN. 125 


we find some clues as to the nature of mathematical ability. 
lle second-order factors A’ and С” seem to represent the basic 
aracteristics of mathematical thinking, namely, eduction and 
manipulation of relations, and the ability to abstract from percep- 
Wal properties the essential mathematical concepts necessary for 
б solution of the proposed problems. The fact that it is primary 
lors E and K that define the second-order factor С” seems to 
imply that mathematical ability may to some extent depend upon 
degree of fluency and flexibility with which one can work under 
nous forms of restrictions. B’ is a number factor. There is no in- 
ion that this number factor can be identified with mathemati- 
1 ability. The clustering of primaries H, F, and L on factor D’ 
s that it may be regarded as a second-order scholastic factor 


‘Meh represents the effect of formal training on the innate abilities 
presented in A’. 


Conclusion 


10 findings of this study suggest that mathematical thought 
з appear to be mainly processes of eduction, organization, 
manipulation of relations. Mathematical thinking seems to be 

terized by a fluency and flexibility of thought-material under 
ling conditions such as are often imposed by the assumptions, 
ates, and definitions of a mathematical problem. The number 
AT appears to be quite unrelated to the other factors in terms of 
eh mather atical ability was described in this study. 


REFERENCES 


E Dorothy C., and Lyerly, Samuel B. Factor Analysis of 
oring Tests. Chapel Hill: University of North Carolina 


б К. “ A Factorial Study of Mathematical Ability.” The 
187-156 ournal of Psychology, Statistical Section, IV (1951) 


c E "A Comparative Investigation into the Factors 


1 athematical Ability for Boys and Girls.” British 
of Educational Psychology, X (1940), 143-144; 212- 


- Factors liam, “A Factorial Study of the Reasoning and Closure 
ll, J. p «Усћотеіті:а, XVI (1951) , 361-386. 

cademic The Factorial Representation of Mental Ability and 
Мило, Achievement.” EDUCATIONAL AND PSYCHOLOGICAL 
MENT, ПІ (1943), 307-332. 


С: 


126 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Collar, Daniel. “A Statistical Survey of Arithmetical Ability." 
British Journal of Psychology, XI (1920) , 138-158. 

Coleman, R. H. *An Analysis of Certain Components of Mathe- 
matical Ability and an Attempt to Predict Mathematical 
Achievement in a Specific Situation.” Dissertation Abstracts, 
XVI (1956), 2062. < 

Comrey, A. L. “A Factorial Study of Achievement in West Point 
Courses.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, IX 
(1949) , 193-209. LA 

Coombs, Clyde, H. “A Factorial Study of Number Ability.” Psycho- 
metrika, VI (1941), 161-189. Е 

Duncker, Karl. “On Problem Solving.” Psychological M. onographs, 
LVIII (1945), No. 5. 

Feierabend, R. and DuBois P. Psychological Problems and Research 
Methods in Mathematics Training. St. Louis: Washington Uni- 
versity, 1959. : 

Fouraere, L. “Psychological Tests of Mathematical Ability.” Forum 
of Education, ТУ (1926). m 

Guilford, J. P. Psychometric Methods. New York: McGraw-Hill 
Book Company, 1954. : 

Guilford, J. P. (Editor) Printed Classification Tests. Army Air 
Forces Aviation Psychology Reports, Report No. 5, Washing- 
ton, D. С.: 0. S. Government Printing Office, 1947. 1 

Guilford, J. P. “The Structure of Intellect.” Psychological Bulletin, 
LIII (1956), 267-293. sy 

Holzinger, K. J. and Frances Swineford. A Study in Factor Analyst 
The Stability of a Bi-Factor Solution. Chicago: University 0 
Chicago Press, 1939. ; in 

Lee, Doris M. “A Study of Specific Ability and Achievement © 
Mathematics.” British Journal of Education Psychology, x 

(1955), 178-189. + al 

Majewska, Sister M. Canisia, CS.F.N. “A Study of Mathemate 
Ability As Related to Reasoning and Use of Symbols. 
published Ph.D. thesis, Loyola University, 1960. ilit 

Oldham, Hilda. “A Psychological Study of Mathematical Ability 
With Special Reference to School Mathematics.” British Journa 
of Educational Psychology, VII (1937) , 269-286. " 

Pemberton, Carol. “The Closure Factors Related to Other Сов? 
tive Processes.” Psychometrika, XVII (1952), 267-288. Jii 

Rimoldi, H. J. A. “A Study of Some Factors Related to Inte 
gence.” Psychometrika, XIII (1948), 27-46. 


Rimoldi, H. J. A. “The Central Intellective Factor.” Psychometrik 


XVI (1951), 75-101. " 
Spearman, Charles. T'he Abilities of Man. New York: The Маст 
lan Company, 1927. don: 
Spearman, Charles and Wynn Jones. Human Ability. Lom 
Macmillan and Company, Ltd., 1951. n No 
Thurstone, L. L. Primary Abilities. Psychometric MonograP 
1. Chicago: University of Chieago Press, 1938. Á Pr. of 
Thurstone, L. L. and Thurstone, Thelma С. Factorial Stud 


, 


. SISTER M. CANISIA, CSEN. vr CNN 


оо, Monograph No. 2. Chieago: Univer- ў 

igo Press, 1942. é e. 

A Factorial Study of Perception. Psychometric 

No. 4. Chicago: University of Chicago Press, 1944. 7 

. Multiple Factor Analysis. Chicago: University of 
1947 


‘An Investigation Into The Factorial Structure of : 
1 Tasks.” Psychological Abstracts, XXVIII (1954), 5 


Factor Analysis of Mathematical Ability.” Unpub- 
. thesis, Harvard University, 1955. 

var. The Mathematical Ability. Lund, 1958. 

<. “Factorial Nature of Ability in Elementary Mathe- 
sh Journal of Educational Psychology, XXVIII 


LAND PSYCHOLOGICAL MEASUREMENT 
No. 1, 1962 


MMPI AS A SCREENING DEVICE IN AN ACADEMIC 
е SETTING 


ы JOHN H. HEWITT and LEON A. ROSENBERG 
AD Army Medical Service School 


Tum ideal screening device is an.instrument that will indicate 
Mich students will pass a given course of instruction and which will 
| Recent studies (Briskin & Stennis, 1957; Fulkerson, Freud & 
anor, 1958; King, 1959) have explored, with varying degrees of 
the predictive value of the Minnesota Multiphasie Person- 
ventory (MMPI). 
€ search for an effective screening instrument, class advisors 
Теё enlisted courses at the Army Medical Service School have 
wi the MMPI, in combination with personal interviews, in an at- 
pt to reduce attrition. The present study evaluates MMPI data 
: from subjects enrolled in these courses during the summer 
three courses, and a brief description of their content, 
SM Uropsychiatric Nursing Procedures—a four-week course to 
technicians for ward work with neuropsychiatric patients. Pa- 
‘Management, security, and rehabilitation are discussed in de- 
gether With many other typical ward problems. 
cal Psychology Procedures—an eight-week course to pre- 
ents to assist the clinical psychologist by administering and 
Ne Wechsler Adult Intelligence Scale, MMPI, and various 
3 g Mdardizod tests, 
Social. Work Procedures—an eight-week course to train stu- 
t the Social work officer by abstracting medical records, 
Social histories from patients, and helping patients through 
_ Beneral problems of hospitalization. 


129 


130 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


In all three courses, stress is placed on understanding the basi 
concepts of normal personality development. Students are gi 
additional instruction in the symptoms and classification of psychi- 
atric illness, and are presented rather extensive information regard- 
ing the problems of the mentally ill. In addition, students attend 
weekly case conferences where various neuropsychiatric cases are 
presented and discussed. 

Experience had indicated that personal interviews alone were gen- 
erally unreliable in eliminating potential failing students from 
courses, because “unexpected” failures continued to occur. The ob- 
ject of the present study was to determine the feasibility of using 
the MMPI as a predictive instrument to assist interviewers in deter- 
mining which students would pass and which would fail a give 
course, 

The hypothesis advanced prior to the study was that the MMPI 
would profitably supplement information obtained through pre 
course interviews with students. It was thought that the MMPI 
would provide the interviewers with special clues concerning area 
of potential abnormality, and would thus allow the investigation of 
subject matter that might otherwise have remained unprobed. 


Subjects 


One hundred twenty-two enlisted students were subjects for the 
present study. For purposes of analysis, the subjects were divide 
into two basic groups: 1) Group A, consisting of 106 subj ects wht 
were screened, using both the MMPI and a personal interview, F 
admitted to a course; and 2) Group NA, consisting of 16 subject 
who were screened in the same way and not admitted to a course 

Two additional groups of subjects were then formed from the 1 
subjects of Group A: 1) Group G, consisting of 95 subjects who "1 
graduated from a course; and 2) Group F, consisting of 11 subje 
who failed a course. d 

Table 1 gives additional information, arranged both by «1885 Е 
by group, for all subjects, | 


Procedure 


The 122 subjects were administered the Shortened Version of the 
Group Form of the MMPI (Hathaway & McKinley, 1951) °” a 
first day of class. Subjects were told that final selection for 4% 


| 


| 


HEWITT AND ROSENBERG 131 


TABLE 1 
Additional Data for the 122 Subjects, by Course and Group 


Mean Years 
Group N Mean Age Education 
е 0 RR ese uel e | 
NP Nursing 69 21.8 13.8 
Clinical Psychology 18 20.8 14.2 
Social Work 19 23.8 14.3 
All Groups (Group A) 1068 22.0 14.0 
Group NA 16 19.5 13.1 
‚ Group G 95 21.9 14.0 
Group F 11b 22.6 13.3 
All Groups 122 21.7 13.9 


ые _ ‘S 


ч 16 who were not admitted (Group NA) are not included. 
Number of failures by course: NP Nursing, 4; Clinical Psychology, 5; Social Work, 2. Reasons 
lure: academic deficiency, 6; emotional unsuitability, 3; insufficient motivation, 2. 


Would be based upon the results of the MMPI and a personal inter- 
View. 

Three teams of interviewers were used. Each team consisted of a 
Psychiatrist and the class advisor of the particular course for which 
the student was applying. The interviews were held during the second 
day of class, and students were advised of their status on the third 
day of class, or sooner, 


Results 


. Table 2 compares the mean raw scores obtained by the 106 sub- 
Iris of Group A, who were admitted to а course, with the mean raw 
E obtained by the 16 subjects of Group NA, who were not ad- 


E conp profiles were essentially the same, with no large varia- 
F Occurring in any scale, | 
E 3 further explores the problem of effective student screening 
earned h MMPI. It presents a comparison of the mean raw scores 
or Y the 11 subjects of Group F, who failed, with the mean raw 
"а еатпей by the 95 subjeets of Group G, who were graduated. 
but hs © Mean raw scores of the two groups were quite similar, 
cale), th Interesting characteristic that in every scale but one (F 
Which th © graduates’ scores were higher. Although the amount by 
Profiles © scales were higher was small in every case, the graduates 
Че Drone theless, presented a less “normal” appearance than did 

les of the 11 who failed. 


132 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 
Mean Raw Scores for Two Groups of Subjects | 


——————— 


Group NA Group A 
Scale (N = 16) (N = 106) 
UNES. 7". 
L 4.19 3.66 | 
F 2.88 3.76 
K 17.19 16.88 
Hs 4.38 4.20 
Hs + 5K 13.19 12.93 
D 17.50 18.65 
Hy 22.31 20.58 
Pd 17.13 16.56 
Pd + 4K 24,13 23.25 
Mf 25.63 26.36 
Pa 9.06 8.99 
Pt 8.56 10.05 
Pt + 1K 25.75 26.92 
Se 7.56 9.62 
Sc + 1K 24.75 26.46 
Ma 17.38 16.92 
Ma + .2K 21.00 20.29 


Table 4 deals with the ММРІ ability to predict the degree 9l 
academic success a given student will achieve. The mean raw scores 
of 15 subjects (Group H) who earned the highest percentage grades 


TABLE 3 
Mean Raw Scores for Two Additional Groupe of Subjecte x 
Group F Group G 
Scale (N = 11) а и 
" i 
3.64 3.66 
F 4.18 3.72 
K 16.09 16.97 ! 
Hs 3.18 4.32 
Hs + .5K 11.64 13.08 
18.45 18.67 
Hy 20.18 20.62 
På 16.36 16.58 
Pd + AK 22.55 24.63 
Mf 24.82 26.54 
Pa 8.18 9,08 
Pt 10.00 10.01 
Pt + 1K 26 45 26.98 
9.55 9.63 
Sc + 1K 25 64 26.56 
Ma 16.27 16.99 | 
Ma + .2K 19.55 20.38 | 


TABLE 4 
Mean Scaled Scores for Highest- and Lowest-Ranking 
Groups, Academically 


Group H Group L 

Scale (N = 15) (N = 15) 
L 3.13 4.00 
F 4.27 4.20 
K 16.40 16.67 
Hs 5.53 3.60 
Hs + .5K 14.00 12.33 
D 21.07 17.67 
Hy 21.53 20.47 
Pd 17.27 18.20 
Pd + 4K 23.87 24.73 
Mf 31.47 24.40 
Pa 10.33 7.98 
Pt 11.07 12.07 
Pt + 1K 27.60 28.73 
Be 9.87 10.53 
Se + 1K 26.27 27.33 
Ma 19.80 17.73 
Ма + .2K 20.40 21.13 


eee 0 


_ Were compared with those of 15 subjects who earned the lowest per- 
"01де grades (Gr oup L). Group H earned grades ranging from 92- 
Ber cent, and Group L earned grades ranging from 71-77 per cent. 
The profiles of the two groups were, again, roughly similar. Group 
Ew approximately 3.4 raw score points higher on the D Scale, 
Кому 7.1 raw score points higher on the M; f Seale. The 
stat ely high scores of these two scales were not investigated 
Sücally, but several inferences will be advanced later. 
ig E investigation concerned perhaps the most ob- 
“peaks” o od of reading the MMPI for screening—the number of 
A T score hs Scores of 70 or more that occur in a given profile. Since 
1 aiia 70 is two standard deviations from the mean (T = 50), 
area for Eo the examiner of a potentially dangerous subject- 
pecially e testee. Two or more profile peaks in a single record are 
imposes pore. Because of the obvious irregularity that a peak 
Mediately s an otherwise innocuous profile, examiners are im- 
Table 5 "UA to further probe the questionable area. 
Subjects wh the number of peaks occurring in the profiles of the 
subj © were not admitted, the 11 subjects who failed, and 
Jeets who were graduated. ^ 


HEWITT AND ROSENBERG T3758 


134 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT { 


TABLE 5 
Scaled Score “Peaks” for Three Groups of Subjects 


—_—_ 


Total Number Average Number 
Group N of Peaks of Peaks 
pO FEM ____ — NE 
Group NA 16 18 1.13 
Group F 11 6 0.55 
Group G 95 103 1.08 


Se ee ттт == 


It is obvious, from Table 5, that merely counting peaks in profiles 
is an unreliable predictor of success or failure in a course. The 11 
who failed had only about one-half as many peaks per man 48 did 
the 95 graduates. Furthermore, analysis revealed that 25 of the 95 
graduates (26%) had two or more profile peaks. 4 

Another indicator of deviancy is the Critical Item Scale! This | 
scale gives the examiner a count of items which are seldom answered | 
in the "deviant" direction, Theoretically, the critical item count for 
a well-adjusted subject is relatively low since these items describe 
only the most blatant symptoms. Conversely, one expects the count 
to rise for more disturbed subjects. For example, a typical item from 
this scale is, “I hear strange things when I am alone” (scored if 
answered "True"). 

"Table 6 presents the critical item score obtained by the 16 sub- 
jects not admitted; the 11 who failed; and three groups of 16 sub- 
jects each, who were randomly selected from the 95 graduates. 

The 16 subjects who were not admitted to a course had lowe! 
critical item scores than did the 11 who failed, and a sampling ? 


TABLE 6 
Critical Item Analysis 


Total Average Number 
Group N Critical Items EL 
Group NA 16 27 1.69 
Стор 5 11 31 2.82 
"Success" #1 16 46 2.93 
"Success" #2 16 39 2.47 
ber 8 


1 Developed by Grayson, Army Medical Service School mimeo num 
410-403-1, revised 1960. 


HEWITT AND ROSENBERG 135 


48 of 95 graduates. The critical item count indicated no essential dif- 
ference between the five groups. 


Discussion 

The data of the present study point to some serious limitations in 
the effectiveness of the MMPI as a screening device in this particular 
setting. First of all, it was shown (Table 2) that there was no es- 
sential difference in the composite profiles of students who were not 
admitted to a course and students who were admitted. There are 
Several possible explanations to account for this apparent lack of 
discrimination: a) the difference between the profiles was too small 

to be recognized by examiners; b) examiners ignored the profiles 
. ‘nd relied instead on interview observations; or c) the test did not 
Sample the criteria used by examiners in judging a student. 

Second, the composite profiles of a group which failed and one 

l Which was graduated (Table 3) were practically identical. If any- 
thing, the graduating subjects presented a slightly more abnormal 
composite profile than did the failing subjects since they scored 
igher on every scale but one. Sereening by the most expert inter- 
Meter of the MMPI would have been futile in this instance. In fact, 
 Péorous and literal screening on the basis of the MMPI might well 
ave been prejudicial against the successful group. 

Third, there was little difference apparent between the composite 
| Profiles of a “success” group (subjects earning the highest academic 
1 ES and a “failure” group (subjects earning the lowest academic 

aN ee only relatively large discrepancies were noted in the 
Thy sco = of the "success" group, which were about 3.4 and 71 
8: a) X inte higher, respectively. A possible twofold explanation 
Caused e subjects were more anxious about doing well, which 
Bent, b € elevation in the D Scale; and b) they were more intelli- 
» better-educated, and had more wide-ranging interests than did 


the “ta: 
ha group, which accounted for the elevation in the Mf 


Fo 79 
ES the data of Table 5 suggested that a traditional way of 


ЧАН profiles—counting "peaks"—4did not predict who 
Actually hs and who would fail a given course. Those who passed 
iled а ES a higher average number of peaks than did those who 
Меге th ourse, Interviews undoubtedly probed questionable areas 
у. were indicated by peaking, But they obviously found 


130 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


little or nothing to indicate handicaps that would seriously impair 
students! performances in class or on the job. A perusal of the pro- 
files of 16 subjects who were not admitted to a course provided strong 
indication that there was little or no difference between the individ- 
ual profile configurations of this group and the group that was ad 
mitted. 

And last, the critical item count performed on each prospective 
student gave no indication of how the student would fare academic 
ally. Examiners did not have access to this score at the time of the 
screening interview, but there is no reason to believe that their judg- 
ments would have been improved if they had. It is perhaps signifi- 
cant, however, that the statement most commonly answered in the 
deviant direction by the 11 who failed was, “I have the wanderlust 
and am never happy unless I am roaming or traveling about” (scored 
""True"). More enigmatic were the two most commonly scored state- 
ments of the three “success” groups, “I have never indulged in any 
unusual sex practices” (scored ^False"), and "I have had very 
peculiar and strange experiences" (scored “True”). Both were 
answered a total of 17 times (or an average of 5.7 times per group) 
in the deviant direction. 

The data of the present study indicate that the MMPI is а rela- 
tively ineffective screening instrument for the three courses under 
consideration, and there are certainly implications that it may also 
be ineffective in other similar academic situations. It yields little 
useful information and may actually influence the screener to те 
ject students who will eventually succeed. It is true that the 
may suggest areas where questions can be directed, especially va 
there is supporting biographical information. But it is questionable 
whether the two hours needed for administering and scoring €9 
record justify the rather meager results. d 

The MMPI did not meet the first criterion for a good screen 
instrument since it did not supply, in a great majority of case da 
that were useful in helping the screener to decide if a student wom 
succeed or fail. 


Summary 


Ninety-five graduates of three courses at the Army Medical ga 


ice School were administered, upon admittance, the Shortened 


sion of the Group Form of the MMPI in an attempt to determine ^ | 


HEWITT AND ROSENBERG 137 


ctive value of this test. Profiles obtained from this group were 
jared with those of 16 subjects who were not admitted to a 
se, and with those of 11 subjects who failed a course, Results 
pated that the profiles obtained by all three groups were essen- 
lly the same, with those of the group admitted being slightly more 
normal." Traditional methods of selecting deviant profiles, such 
counting “peaks” or critical items, were useless in predicting suc- 
or failure. The study found no single measure or combination of 
s by the MMPI which would provide interviewers with clues 
ing a given individual's chances for success in a course. 


REFERENCES 


G. J. and Stennis, J. W. “Improving Predictability of 
nesota Multiphastic Personality Inventory.” United States 
Armed Forces Medical Journal, VIIL (1957) , 589-543. 
ukerson, S. C., Freud, S. L., and Raynor, G. Н. “The Use of the 

IMPI in the Psychological Evaluation of Pilots.” Journal of 
Aviation Medicine, XX1X. (1958) , 122-129. f 
B. T. “Predicting Submarine School Attrition From the Min- 

ta Multiphasie Personality Inventory.” United States Navy 
edical Research Laboratory Report, COCXIII (1959), 1-25. 
away, S. R. and McKinley, J. C. Minnesota Multiphasic Per- 
ality Inventory. New York: Psychological Corporation, 1951. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vot, XXII, No. 1, 1962 


PSYCHOTIC SYMPTOM PATTERNS IN A BEHAVIOR 
INVENTORY 


MAURICE LORR 
Catholic University and Veterans Administration 


AND 


JAMES P. O'CONNOR 
Catholie University 


Sox six behavior inventories have been developed for use by psy- 
chiatric aides and nurses to describe psychotic patient behavior in a 
hospital. The pioneering McReynolds (1952) Hospital Adjustment 
Scale and its derivatives, as developed by Shatin and Freed (1955) 
and by Aumack (1957), represent efforts to measure the construct 
of hospital adjustment. The Flanagan and Schmid (1950) check list 
55 based on a critical incident approach. Guertin and Krugman 
(1959), on the other hand, have applied the techniques of factor 
analysis towards the development of a form. More recently, Lorr, 
ш, апа Stafford (1960) constructed an inventory, the Psy- 

otic Reaction Profile (PRP), designed to measure 10 reaction pat- 

75 previously identified in various factorial studies of interview 
and ward ratings. 

a the construction of PRP, 10 dimensions were postulated. All 
pt one were based on earlier studies by the authors that utilized 
E and interview information. The postulated dimensions 
^ > 70 Excitement, Withdrawal with Psychomotor Retarda- 
E. чапа Projection, Perceptual Distortion (Hallucinations), 
p, P'ual Disorganization (Incoherent and Disordered Speech), 
А о, Depressive Agitation, Motor Disturbances (Manner- 
ble of dd , and Hostile Belligerence. Finally a personality varia- 
minance versus Submissiveness was introduced as potentially 


140 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


useful in discriminating between patients. The purpose of the present 
study was to determine whether the ward behavior factors PRP was 
designed to measure could be confirmed. Those parameters con 
firmed could then be compared qualitatively with ward as well as 
interview factors previously identified. A rigorous check of corre 
spondence between the two was not contemplated. 


Method 


The sample studied included 500 male psychotic patients ran- 
domly drawn from a larger sample of 688 received from 27 Veterans 
Administration, 15 state, and five private psychiatric hospitals in the 
United States and Canada. The cooperating hospitals were asked to 
secure observations by their more intelligent and conscientious 
nurses and aides on a representative sample of their patients from all 
services. The only restrictions were that patients included be less 
than 55 years of age, ambulatory, and primarily functional psycho- 
ties. Observers selected were to be informed of the goals of the re- 
search and to be given ample opportunity to familiarize themselves 
with the inventory. Inventories were completed under supervision 0 
a psychologist or psychiatrist at the close of a one-week observation 
period, 

PRP is comprised of 85 statements to be rated true or not true bY 
a nurse or attendant. The process of development included the con- 
struction of 400 behavior statements, each designed to measure one 
of the 10 postulated areas of behavior disturbance described abov® 
A smaller set of 172 were selected on the basis of the consensus o 
three out of four judges, who agreed to the allocation of a statement 
to one and only one pattern, The statements included in the P 
were selected on the basis of procedures described elsewhere (Lor^ 
et al., 1960). 

The 85 X 85 table of correlations analyzed was product moment 
throughout. The multiple group centroid procedure of factoring d** 
scribed by Guttman (1952) was followed. For this purpose 9^" 
priori simple structure weight matrix may be constructed. The ro 
represent scale variables and the columns represent postulated ии 
tors. Each variable is given a unit, weight in the factor colum) . 
which it is allocated and a zero weight in all other columns. The o 
for allocating a variable in the 85 X 10 matrix was its original cate 
gorization by three out of four judges as described above. 


— НА ig te e OU Y eee 


LORR AND O'CONNOR 141 


plication of the weight matrix indicated that the clusters 
representing Perceptual Distortions, Conceptual Disor- 
and Motor Disturbances were not independent. Accord- 
three clusters of variables were all allocated with unit 
its to one factor column of the weight matrix. Subsequently 
irthogonal factors were extracted simultaneously. To sharpen 
erences, the single plane procedure for rotation was applied 
| regard to the meaning of the variables until a satisfactory 
simple structure was attained. Not more than one or two ro- 
ere required for any factor. 


The Results 


ibe confirmed fully, each variable in a postulated pattern should 
sole significant factor loading in the factor column to which 
allocated a priori. If we designate a factor loading of .30 as 
„ then the extent of confirmation can easily be determined. 
t behavior disturbance patterns regarded as fully confirmed 
anoid Projection, Hostile Belligerence, Resistiveness, Domi- 
Manic Excitement, and Agitated Depression. About a third of 
Ў ariables predicted to fall on Withdrawal failed to reach .30 or 
ed highest on another factor. The patterns of Perceptual Distor- 
Conceptual Disorganization, and Motor Disturbances, as men- 
er, were combined and extracted as one factor. 
Presentation that follows, each common factor will be dis- 
n turn. Only variables having loadings of .30 or higher will be 
ever a variable has a significant loading on a factor other 
one being described, this factor and its loading will also be 
actor loading of an item keyed Not True is preceded by a 


Factor TD—Thinking Disorganization (Ward) 


Об of time talking to himself, 
himself about imaginary or real faults. 
nears things that are not there. 
ly talks to himself. 
* to himself without any sensible reason. 
з and phrases in a meaningless way. 
294 not sensible. 

Uses words that aren't understandable. 


БЕЗЕ АЕ 


142 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Talks whether anyone is listening or not. 51 
Sometimes giggles in a silly way. 50 
Answers sensibly when talked to. — 46 
It is difficult to understand what he is saying most of the time. 45 
Drifts off the subject when he talks. 41 
Makes faces and strange movements that do not make sense. — 4l 
Does not know where he is. 38 
Does not know the names of the ward aides. 31 
Usually knows what time it is. —3 


The factor called Thinking Disorganization (Ward) appears to 
represent a second-order parameter. It is judged to bea second-order 
factor because it includes diverse elements previously identified as 
distinct, such as motor disturbances, hallucinations, and thinking 
disturbances. The underlying disorder seems to be broadly indicative 
of schizophrenic disorganization and disintegration as conceived by 
Kraepelin and Bleuler. The failure to differentiate the three elements 
postulated could be due to a paucity of defining statements included, 
or to the inability of raters to discriminate between different facets 
of basically queer or bizarre behavior. Lorr, et al. (1954, 1955) have 
isolated a similar second-order factor of disorganization of thinking 
in interview data. However, the degree of correspondence between 
ward and interview factors cannot be ascertained in the absence 0 
ratings obtained on the patients from both sources. It is for this 
reason that the factors are referred to as ward factors even though 


they appear very similar to the interview factors of the same name | 


Factor PR—Paranoid Projection (Ward) 


Blames the hospital for lack of attention and care. a 
Complains about the food and care he receives. ‘59 
Acts as though the ward attendants are against him. 52 


Acts as though the hospital is persecuting him. 


z 1 
Often irritable, grouchy or complaining. : 


The Paranoid Projection (Ward) factor describes a tendency 7 


patients to feel that people are against him, to attribute hostile mo 
tives to others, and to grouse or complain in dissatisfaction. D 
interview, a similarly labeled factor is defined by delusions of те in 
ence, persecution, and influence (Lorr, et al. 1955, 1957). Aga" 


d LORR AND O'CONNOR 143 


identification of the two factors as the same is not possible from the 
data collected. 


Factor HB—Hostile Belligerence (Ward) 


Often swears and uses obscene language. 58 
Sometimes threatens to assault others. 50 
-. Often shouts and yells. 48 
Loses temper when dealing with other patients. 48 
Becomes noisy and hilarious at times. 4l 
| Demands the attention of doctors. 39 
-. Doesn't swear or curse in the presence of aides and doctors. —.38 
- Has a sarcastic way of talking to other patients. 38 
Is likely to hit someone for no partieular reason. 85 
Quick to fly off the handle. 34 
Yells at attendant when he is dissatisfied. 33 


The factor labeled Hostile Belligerence has been identified re- 
Peatedly. Degan (1952) and Lorr, et al. (1955, 1957) have isolated 
factor in ward data. Bostian, e£ al. (1959) have found a similar 
factor in a mental status examination. Guertin (1959) calls this 
factor Emotiona] Control and describes it as involving emotional 
“plosiveness expressed motorically. In nearly every study, this syn- 
m 35 Positively associated with morbid suspicion as exemplified 
Y the Paranoid Projection factor. 


UN ———-.— алал 


Factor R—Resistiveness (Ward) 


| кыв Suggestions and requests from aides. a 
metimes does the opposite of what he is asked to do. 38 

8 lo be Pushed to follow routine, 2 

if he is kidded, UE 


15) tiveness factor has twice been isolated (Lorr, et al. 1955, 
are ер data. There seems to be little doubt that these factors 
n к. Ф Resistiveness represents a tendency, high in catatonics 
instructi 014 patients, to resist helping, working, or complying with 
asked, ons, as well as a tendency to do the opposite of what is 


А Factor D—Dominance (Ward) 
tes the Other patients. 74 


144 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 


Tells the other patients what to do. .65 
Acts superior to other patients. AT | 
Upsets other patients by ће ways he talks to them. .38 (HB .31) 


The factor of Dominance has not been isolated in ward or inter- ' 
view data previously. It is most likely a personality characteristic 
rather than a psychotie reaction. However, it represents a useful 
parameter for describing ward behavior. The bossy, attention-de- 


manding, aggressive patient is often seen, especially among, acute 
cases, 


Factor O—Overactivity (Ward) 


Is always doing something, я 
Seems always busy with plans and projects. s 
Starts conversations with aides to become better acquainted. 88 ; 
Will do anything for recreation that comes up. 3 
Has to be helped along to stick to any activity. —3 | 
Will sit all day unless directed into activity. و‎ 


The factor tentatively labeled Overactivity (Ward) is character- 
ized by overactivity and busyness. The original pattern (called 
Manic Excitement) included additional statements which were ex- 
cluded from PRP because they were checked in less than eight per 
cent of patients. However, the statements remaining do not define ал 
excitement although they do have their highest loadings on the ex- 
pected column. Another study with added variables would be needed - 


to ascertain whether overactivity actually represents a manic eX 
citement. 


Factor AD—Agitated Depression (Ward) 


Shows real sadness. 57 

Usually worried and nervous. 52(D E 
Seems to be unhappy. 44 (D 37) 
Seems scared all the time. 43 

Usually looks tired and all worn out. 36 30) 
Usually is slow moving and sluggish. 33 0 


The Agitated Depression (Ward) factor implies a reaction of P 
anxiety, and psychomotor retardation. While not as fully defined r 
similar factors isolated by Wittenborn (1951) and by Lor i | 
(1955, 1957), it is strongly suggestive of a depression with agitatio | 


f 


. 


1 
Sh 
- 8h 
Ex 
facto, | 
View q ta 
I 
Р, 
цу 


LORR AND O'CONNOR 145 


Because in earlier studies this parameter is defined principally by 
interview variables, it is not possible to say whether the ward factor 
is actually the same as the interview factor. 


Factor W—Withdrawal (Ward) 


Never says more than three or four words at atime. .54 (AD —.34) 


Laughs or smiles at funny comments or events. —.48 
Doesn’t take part in back and forth conversation. .45 
Usually stays by himself. 45 
Says thanks when anything is done for him. =A 
Is backward about talking to you. 41 
Reads the newspaper. —48 
Doesn't mix with other patients. 40 
Has no friend on the ward, 40 
Never volunteers information about himself. 39 
Shows occasional interest in news and current 
events, —.34 
Never asks for anything; waits for things 
Р to be given him. 36 
Likes to go for exercise, —33 
Gores the activities around him. 32 
ows no response to entertainment. 32 
ows real friendliness towards at least one 
other patient. — 31 
terested in nothing. 31 


The Withdrawal factor (Ward) is well represented in the Inven- 
ав been isolated in interview as well as ward data. The 
т is defined by statements indicative of taciturnity or fail- 
Pond in conversation, avoidance of interactions with others, 
nterest in the surrounding environment. Very similar 
tating se © been isolated by Lorr, et al. (1955, 1957, 1960) from 
ale data. 

m for one study (Guertin & Krugman, 1959) all published 
analyses of ratings and check lists have been based on inter- 

tiong, : A alone or on combinations of interview and ward observa- 
BL of some interest, to find that Agitated Depression, 
__ ‘lection, and Thinking Disorganization may be demon- 


© n observations collected by nurses and ward attendants. 


tory, Tt h 
Paramete 
res 
and lack of i 
factors hay 


146 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Summary 


A test was made for the presence of 10 hypothesized patterns in 
the Psychotic Reaction Profile, an 85-statement behavior inventory. 
The sample consisted of 500 male psychotic patients observed and 
rated in 47 psychiatric hospitals. A multiple group factor analysis 
confirmed seven of the patterns postulated. The remaining three pat- 
terns were represented by an eighth more general factor of Thinking 
Disorganization. 


REFERENCES 


Aumack, L. Social Adjustment Rating Scale. Roseburg, Oregon: Vet- 
erans Administration Hospital, 1957. б 
Bostian, D. W., Smith, Р. A., Lasky, J. J., Hover, G. L., and Ging, 
R. J. “Empirical Observations on Mental Status Examinations. 
Archives of General Psychiatry, I (1959), 253-262. - 

Degan, J. W. “Dimensions of Functional Psychosis.” Psychometrie 
Monographs, (1952) , No. 6. 

Flanagan, J. C. and Schmid, F. W. “The Critical Incident Approach 
to the Study of Psychopathology." American Psychologist, 
(1958) , 330. : 

Guertin, W. H. and Krugman, A. D. “A Factor Analytically Derived 
Seale for Rating Activities of Psychiatrie Patients." Journal 0; 
Clinical Psychology, XV (1959), 32-35. 

Guttman, L. *Multiple Group Methods for Common Factor Analy- 
sis." Psychometrika, XVII (1952), 209—222. A 

Lorr, M. “The Wittenborn Psychiatric Syndromes: An Oblique D 
tation." Journal of Consulting Psychology, XXI (1957), 439-44 

Lorr, M., Jenkins, В. L., and O'Connor, J. P. “Factors Descriptive 
of Psychopathology and Behavior of Hospitalized Psychoties. 
Journal of Abnormal and Social Psychology, L (1955), 78-80. 

Lorr, M., McNair, D. M., Klett, C. J., and Lasky, J. J. “Confirmation 
of Nine Postulated Psychotic Syndromes.” American Psycholo- 
gist, XV (1960), 495. (Abstract) j 

Lorr, M., O'Connor, J. P., and Stafford, J. W. “A Psychotic Reaction 

Profile.” Journal of Clinical Psychology, XVI (1960), 241-245. , 

Lorr, M., O'Connor, J. P., and Stafford, J. W. “Confirmations 
Nine Psychotic Symptom Patterns.” Journal of Clinical Psycho 
ogy, XIII (1957), 252-257. z 

MeReynolds, P., Ballachey, E. L., and Ferguson, J. T. “Develop; 
ment and Evaluation of a Behavior Scale for Appraising the yI 
justment of Hospitalized Patients.” American Psychologist, 
(1952), 340. (Abstract) tal 

Shatin, L. and Freed, E. X. “A Behavioral Rating Scale for Men 
Patients.” Journal of Mental Science, CI (1955), 644-653. , dii 

Wittenborn, J. R. and Holzberg, J. D. “The Generality of Psyehlor 
атир Journal of Consulting Psychology, XV (1991); 


n ———— А“, — — ————— HÀ ——————  —— س‎ 
سے‎ 


AND 
| Fitting t 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vou. XXII, No. 1, 1962 


ELECTRONIC COMPUTER PROGRAMS AND 
ACCOUNTING MACHINE PROCEDURES 


Edited by 
WILLIAM B. MICHAEL 


University of Southern California 


А General Correlation Program for the IBM 650. Exaxor M. 
re ГИК ООН 

CORRI and CORR2— Correlation Routines for the IBM 7070. 
Gary Lo 


"ооо ж зз ов зве з вв ваз ез зз LE 


An IBM 
Variabl 


DD LE 


tainin 
Analy 


KOWITZ 


C TORR tee meee eee ewe eee ee eeee eee ee O 
үче Program for the Meehl-Dahlstrom MMPI Profile 
0, Шев, BEN 


JAMIN KLEINMUNTZ AND L. BARTON ALEXANDER 
Cl 0f Some Computer Techniques for Factor Analytic 
tion, R, K, Eyman, H. F. DINGMAN, AND C. E. MEYERS 


147 


149 


155 


163 


167 


171 
177 


183 


187 


193 
201 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vou. XXII, No. 1, 1962 


A GENERAL CORRELATION PROGRAM 
FOR THE IBM 650 


ELLIOT M. CRAMER: 


Biometric Laboratory 
The George Washington University 


Introduction 


: Тиз program has been designed to compute rapidly a matrix of 
intercorrelations or cross-correlations, as well as means and stand- 
ard deviations, where there may be missing observations for some 
or all of the variables, An intercorrelation matrix is defined as the 
Matrix of all possible pairs of correlations within a set of variables; 
* cToss-correlation matrix is defined as the matrix of all possible 
correlations between a variable in one set and a variable in a sec- 
ond set. The number of variables is limited only by considerations 


EU and tape eapacity; the number of observations must be less 
ап 10 


000. Sequence checking of cards within variable is included 
Mtomatically and variable number may be sequence checked if 
paired. The program is self-restoring and any number of matrices, 
mi acteristics of which are punched on a control card, may be 
ES wy loaded, computed, and punched. Several input and 
n forms are provided. Index registers and at least three tape 
Are required. The method may be adapted for other computers 
*ndle a large number of variables with minimal storage. 


Th Method 
* computation formulas are 
m= E X/n о? = EX’ – m; 
бее” (X XY/n — mm,)/o.04; 
Beg oram was written while the author was at the National Institutes 
* The author is indebted to Dean J. Clyde for several suggestions. 


149 


150 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


where 7 is the number of paired non-missing observations and the 
summations are over such observations. The data are read in on 
cards one variable at a time and У) X, У X*, and n are computed 
simultaneously. The data are written onto magnetic tape in records 
containing 25 observations along with the above sums for that 
record only. After each variable has been read in, m and е are com- 
puted. As will be seen, the above procedure greatly speeds up the 
compuíations of correlations when there are missing observations. 
After all the observations have been read onto tape one, the following 
iteration takes place. 


l. Write the first variable from tape one onto tape three and 
rewind tape three. E 

2. Read a record from tape one and from tape three computing 
È XY and reducing n, >, X, and У) X? by 1, Xa and Xi 
when F; is missing but X; is not; also reduce >, Y and 
by Y, and Y? when X, is missing but Y, is not. Write the 
the record from tape one onto tape two. 

3. Continue reading tape as in 2 until all observations have ee 
read for this pair of variables and then compute ray. Rew 
tape three. А 

4. Return to step 2 initializing and thus obtain the correlation 
of the variable on tape three with the next variable on tape 
one. 

5. When tape one has been completely read in this manner, ТЄ 


wind both tape one and tape two, interchanging their name 
and return to step 1. 


In this manner an intercorrelation matrix is obtained with al 
data and intermediate results stored in core and with the main 1 
struction loop well optimized. If no observations are missing, D: 
and standard deviations are Stored on tape with each variable, th 
making the computations somewhat faster. , # 

To obtain the Cross-correlation matrix, a few instructions Е | 
changed so that the two Sets of variables are stored on tapes E 
and two; step two is changed to read tape two instead of tape a 
and the “write” instructions are omitted. Tape one is never is 
wound and the resulting nonsymmetrie matrix is obtained. This у 
useful for correlating a small number of variables with 4 larg 
number, 


ELLIOT M. CRAMER 151 


nputations are in fixed point single precision with the sig- 
digits in the left of each word. Intermediate results are 
and rounded by an amount indicated on a control card. This 
maximal accuracy while preventing overflow in the accumu- 


nof ^X? and 3504. 
CARD FORM 


Mord 1 xx yyyy zzzz 
_ XX is the matrix identification. 
Уууу is the variable number. 
2227 is the card number within variable. 
- Words 2-6 xx хххх 000z 
XX Xxxx is a score on a particular variable. All observations 
_ On a card are for the same variable. 
z is 1 for a missing observation and 0 otherwise. 


Тһе decimal is aligned for each variable but need not be the same 
all variables. If there are fewer than six digits in the data, zeros 
Ша be punched to the right. Two digit data, for example, would 
punched as xx 0000 0000. Five per card input is normal, but the 
Mber of observations per card may be from one to seven provided 


ic, the group with the smaller number of variables should 
entered first to maximize speed. 


ROL CARD FORM 


1 xx Уууу 7777 

Хх is the matrix identification. 

с “Уууу is the number of variables in the first group. (If the 
| matrix is symmetric there is only one group.) 

zz is the number of variables in the second group. 

2 x0 000у zzzz 

1s the number of positions shifted for the accumulation of 


_ X Will always be sufficient for 10* observations. 
Y 1s the number of data words per card. 
222 is the number of observations per variable including 
Sing data. 
@3 00 abcd efgh 


152 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 
The following are a series of binary decisions with 1 indicating 

yes and 0 indicating no. 

a. punch results in 6 per card matrix form 

b. punch eards in 1 per card form 

€. write tape in 1 per record form 

d. matrix is symmetric 

€. save original data on tape 

f. input is on cards 
g. data may be missing 
h 


- Variables are numbered in sequence from 0001 


OUTPUT FORM 


There are three possible types of output. The one per card form 
has 
Word 1 xx Уууу zzzz 
xx is the matrix identification; уууу and zzzz are the num- 
bers of the variables involved. 
Word 2 correlation coefficient with nine decimal places. 


Word 3 variance of Уууу or standard deviation if no missing 
data. 


Word 4 variance of 2222 or standard deviation if no missing data. 
Word 5 mean of yyyy 
Word 6 mean of zzzz 

a 


Word 7 number of paired observations 
n is scaled 10 and the others are as the original data except that 
variances are scaled twice that of original data. The tape output 
form is the same as the one per card form. 
The six per card form has 
Word 1 xx Yyyy 222z 
xx is the matrix identification; yyyy is zero or the number 


of one of the variables involved in the six correlations; 2222 8 
a sort number indicatin 


with xxxx. 
Word 2 90 0000 000x 

xis 1, 2,3, or 4 indicating that the card contains the number 
of observations, means, unbiased standard deviations, or corre- 
lation coefficients, respectively. 
Word 3—Word 8 

Successive values with r scaled 7, n scaled 10, and m and о 


g the group of six variables correlated 


ELLIOT M. CRAMER 153 


scaled the same as the original data. For the nonsymmetric case 
he first values of m, о, and n for each group will be on a new 
саа. 

For both card forms, end cards are punched with 

Word 1 xx 0000 9999 


NG PROCEDURE 


‘The 6 per card form may be listed as an upper triangular matrix 
dn the following way. 
f Sort out cards with 9 in column 11. 
Sort out 4 in column 20 from these cards if means, standard 
deviations, and number of observations are not to be included 
in the matrix. 
Sort these cards on columns 7-10. 
Combine groups and sort on columns 1-2. 
List on eight word board. 


Timing 
А Approximate timing formulas in minutes for finden three digit 
Mata are 
z No missing data 
T= 001 (1.4NV? + 1.3V? + NV) symmetric matrix 
à Missing data 
E 001 C19NV? + 28V? + NV;) symmetric matrix 
No missing data 
Î = 001 (20NY,V, + 23V. V, + N(V, + V.) non-symmetric matrix 
Missing data 


Î = 001 (36NV,V, + 5.4V, V; + N(V; + Vs) non-symmetric matrix 


| lM the above, N is the number of observations; V; and Уз are the 
limber of variables in each group. 


OPERATING INSTRUCTIONS 


1. Load tapes numbered 8010, 8011, and 8012 for intermediate 
Storage; for tape output load 8013; to save data on tape load 
8014. Use standard eight word boar 

2. Ready punch feed. 

3. Load program with transfer to 0500 and follow with control 
card and data. Additional problems may follow with their own 
control cards. Programmed stops are 0001 to 0011. All but 
0010 are tape errors and pressing start key will backspace and 
try again. A stop of 0010 indicates a card sequence error. The 
actual card number will be in the upper accumulator while the 
lower will contain the expected number. To correct, cards 
should be run out and reordered beginning with the number in 
the lower. Pressing start key will continue card reading. To 
continue as if the card read were correct, set 8000 to minus 
and transfer to 0903. If no Sequence check within variable is 
desired set 8000 to minus, originally, 


154 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 
| 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


CORR1 AND CORR2—CORRELATION ROUTINES 
FOR THE IBM 7070 


GARY LOTTO 
University of Pittsburgh 
and 


American Institute for Research 


Purpose 


CORRI and CORR2 are designed to calculate sums, sums of 
Squares, and all possible sums of cross-produets for up to 130 
variables of four digit data. It converts these results to a printed 
or punched matrix of correlation coefficients. CORRI is designed 
to read input from tape, CORR2 from cards. 


Method of Functioning 


CORRI reads its tape input into two alternating areas in order 
to conserve the time of tape reading for its calculations. While the 
tape is reading into one area, the sums, sums of squares, and sums 
Of cross-products for the other area are being calculated. CORR? 
Teads its data cards one at a time. It performs all calculations pos- 
sible upon the data in that card before progressing to read the next 
card. Up to 15 variables may appear on a card, up to 9 cards per 
case. When each card is read, it is checked for proper sequencing 
of cards within the case, and for identification the same as the other 
cards of this case, If an error is encountered on a middle card within 
я case, a special “clean-up” routine resubtracts the quantities al- 

абу computed for this case from the cumulations in storage, and 

€ current case is bypassed entirely. А 
E Programs use the technique of including a “dummy vari- 

6" variable 0, with a constant value of +1. Thus the cross- 


155 


156 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
products of each variable i with all variables following and inelud- 
ing it, i, i+1, i--2, . . ., produce all sums, sums of squares, and 
sums of cross-products in the case of CORRI. The cross-produets of 
all variables preceding and including variable i produce the same 
effects. This technique is used in the program CORR2 because of 
the special one-card-at-a-time feature. N is also produced as the 
sum of cross-products of variable 0 with itself. 

Upon recognizing that all of the data have been processed (by a 


tape end-of-file condition for CORR1 or by reading a card punched _ 


“LAST” for CORR2), the appropriate program converts the fixed- 
point matrix of N, sums, sums of squares, and sums of cross-prod- 
ucts to a floating point matrix of N, means, standard deviations, and 
correlation coefficients. It prints out the correlation matrix, in 3 
decimal place fixed point output format, as a symmetric matrix 
with blank diagonals, and two additional vectors, one of means and 
one of standard deviations. If any variable has zero variance, its 
row and column in the output matrix will be left blank, but its mean 
and zero standard deviation will be reported. If the number of 
variables exceeds 19 (the maximum number of columns available 
to the printer which still leaves room for the vectors of means and 
standard deviations), the entire matrix is printed out in blocks 
which may be pasted together to form the large resultant matrix. 


Timing 
Tables I and II show the timing of the cumulation phase of 
CORRI and CORR2 as a function of the number of variables and 
the average number of digits in the data. The transfer routine takes 


up to about 115 minutes for 130 variables. The printout or punch- 
out of the matrix proceeds at output speed. 


Machine Requirements 


The program is designed to operate on an IBM 7070 with 10,000 
words of core storage and automatic floating point arithmetic. 
Modification for a 5,000 word machine is simple, and would reduce 
the maximum number of variables to about 85. The program might 
similarly be modified to operate without automatic floating point 
arithmetic by subroutine simulation, with only a small reduction in 
the maximum number of variables, CORRI requires at least one 
tape unit, while CORR2 requires a card reader. Both programs re- 


f 


GARY LOTTO 157 


quire either a 7500 printer or card punch. All control panels used are 
the standard utility panels. 
Input Format 


Parameter Card 
This card precedes the data cards for CORR2, and for CORRI is 


the only card used. It specifies the number of variables. 


ў 

; $ 
Ж 

9 8 

РА 

8 .TABLE 1 9 
QN 

Y g 


CORRI OPERATING TIMES b 
q 


140 


80 100 120 
NUMBER OF VARIABLES 


20 40 60 


158 


SECONDS PER CASE 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Col. Contents 
1-7 Zeros 
8-10 V, the number of variables 


11-80 Zeros 
12-punches should appear in cols. 10, 20, 30, 40, 50, 60, 65, 70, 


and 80. 


B. 
TABLE II $ 
2 
CORR2 OPERATING TIMES g 
ae 
w 
ў 
f 
gg 
os 
ie 
ox 
KÊ 
PES 
E 
1 9 
Q 
= 
6 
5 
4 
3 
2 
1 
0 
0 20 40 100 120 


60 80 
NUMBER OF VARIABLES 


140 


GARY LOTTO 159 
Data Cards (for CORR2) 
Each case is represented by a card or set of cards containing the 


observations for that case on V variables. Each card contains up to 
_ 15 variables. Thus, if V = 31, three cards are required for each case. 


Col. Contents 
1-4 Alphameric subject identification. Should be the same for 
all cards for the same subject. 

5 Sequence number, identifying card number within case. 
The first card of each case should be numbered “1,” 
the second card “2,” etc. 

6-10 Variable 1 field (for card 1), variable 16 field (for card 2), 
ete. 

11-15 Variable 2 field (for card 1), variable 17 field (for card 2), 
ete. 

16-20 Variable 3 field (for card 1), variable 18 field (for card 2), 
ete. 


76-80 Variable 15 field (for card 1), variable 30 field (for card 2), 
etc., where a 5-column variable field is defined as a 
sign followed by 4 digits. Positive signs may be left 
blank, as may be leading zeros in the data. 

Data Tape (for CORR1) 

Each case is represented by a single tape record containing ex- 

actly V words. 

Word Contents 


1 The value of variable 1 
2 The value of variable 2 


V The valtie of variable V 


Transfer Card (for CORR2) 


This card follows the data, and is used by the program to signal 
end of data, 


160 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT ^ 
TEST PROBLEM—INPUT DATA 


0000000019 
11 43 
12 43 
21 43 
22 43 
81 40 
32 43 
41 34 
42 43 
51 45 
52 13 
61 13 
62 43 
71 56 
72 45 
81 45 
82 45 
91 43 
92 13 

101 46 
102 45 
11 46 
12 09 
121 46 
122 45 
131 5 
132 1 
141 78 
142 45 
151 78 
152 29 
161 78 
162 45 
171 » 49 
172 93 
181 97 
182 37 
191 88 
192 80 
201 78 
202 -02 
211 98 
212 

221 79 
222 01 
231 79 
232 73 
241 78 
242 79 
251 78 
252 10 

LAST 


BSESEBTET5EE. 


Smor 


73 


10 


92 
53 
34 


64 
27 


4 
46 


25, 63 


88 
20 
20 


91 
20 
11 
65 
95 


14 


GARY LOTTO 
TEST PROBLEM—CORR 1 OR CORR 2 OUTPUT (N = 25) 


со 0-10 сл خر‎ Со س دا‎ 
1 
= 
©2 
© 
m 
to 
© 


| E —079 —016 
Е 003 196 
098 195 


3 4 5 
171 —185 —139 
525 004 129 
696 661 
696 609 
661 609 
309 151 748 
—157 —191 298 
—111 —246 096 
—334 —510 —353 
—141 —370 —195 
129 —016 —031 
230 320 249 
540 431 226 
404 298 140 
—092 —269 —094 
—141 —182 —102 
193 096 120 
108 019 —089 
274 192 108 
14 15 16 
024 064 002 
149 246 054 
404 —092 —141 
298 —209 —182 
140 —094 —102 
069 058 —034 
—100 002 246 
072 219 009 
—420 192 231 
—199 119 226 
—@73 —257 151 
097 —321 —016 
536 —029 —335 
104 001 
104 —103 
061 —103 
239 —031 468 
271 131 270 
227 075 120 


6 7 
—010 116 
—014 —144 

309 —157 

151 —191 

748 298 

725 

725 

432 593 
—148 184 
—079 193 
—189 —208 

025 —187 
—004 —252 

069 —100 

058 002 
—034 246 

292 240 

150 24 

201 190 

17 18 
—048 133 
—083 —005 

193 108 

096 019 

120 —089 

292 150 

240 244 

337 331 

009 078 

404 

007 270 
—079 003 
—016 196 

289 271 
—031_ 131 

468" 270 

520 

520 

629 797 


8 


415 
—029 


9 10 


161 


162 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Col. Contents 
1-4 The word “LAST” 
5 Blank 


6-80 Any valid alphamerical information, including blanks. 
But col. 65 may not contain any symbol with a 12- 
punch. 


Transfer Tape Record (for CORR1) 


The program will Tecognize an end-of-data condition when it 
reads a tape mark on the CORRI data tape. 


Operating Instructions 

Console settings: priority channels A and B set at N-OFF, altera- 
tion switch settings are irrelevant, 

1/0 settings: standard utility panels are used throughout, reader 
and printer alteration switches set at AAAA, punch switches (if 
used) at BBBB. 

Loading: enter into 0000: +69 0111 0004, into 0001: —01 0003 
0010; depress computer reset, start; insert program (with punch 
option deck included, if desired) into read hopper, followed by 
parameter card and, if CORR? is used, data cards and transfer 
card. Additional sets of parameter cards, data cards, and transfer 
cards may follow the first. If CORR1 is used, the input tape is 
tape 10. Depress reader start. 


Test Problem 


A test problem is included to demonstrate output formats. The 
input presented is that of CORR2. CORRI input corresponding to 


it would be an unlabelled tape without identification or sequence | 
information. i 


| 


| 
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vot. XXII, No. 1, 1962 


MRI—MULTIPLE REGRESSION ANALYSIS 
| ON THE IBM 7070 


GARY LOTTO 
University of Pittsburgh 
and 
| American Institute for Research 


Purpose 


MRI will compute the appropriate statistics to a multiple regres- 
sion analysis for up to 129 independent variables. It operates ina 
stepwise fashion, first computing these statistics for the one “best” 
Ly. It then selects the “best” of the remaining variables from the 
Pool of independent variables (from which have been partialed out 
the first variable that had been selected). In this fashion, i.v.'s may 
be designated as “in” or “out” of regression, and the program at- 
tempts to approximate the optimum set of iv.’s by “entering into 
regression,” one variable per cycle, all i.v.'s satisfying а specified 
criterion of significance, During each cycle, the program may also 

delete from regression” any variable previously entered which, as 
“result of other variables being entered, is no longer significant. 


Method of Functioning 


MRI operates in two phases. Phase 1 moves the matrix produced 
by either of the programs CORR1 or CORR? to satisfy the special 
| heeds of MRI. Standard deviations are converted to unbiased esti- 

mates; means and new standard deviations are moved to special 

Work areas while the diagonals of the matrix are set to unity. Some 
I of the basic initialization necessary to phase 2 are also performed. 

Phase 2 performs the stepwise regression analysis. The original 
Matrix ig analyzed for the best single i.v., and that is entered into 
*egtession. The matrix is then reduced. That is, the effects of the 
Variable entered into regression are partialed out of the covariances 


163 


164 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


remaining in the matrix. The reduced matrix, and all future reduced 
matrices on successive cycles, are analyzed in order to determine 
two things: that variable not in regression which will make the 
largest potential contribution to the multiple correlation coefficient, 
and that variable in regression which is making the smallest, con- 
tribution to the multiple. If the least significant variable in regres- 
sion is less significant than a prespecified parameter, this cycle will 
delete it from regression. If the most significant variable not in 
regression is more significant than a prespecified parameter, this 
cycle will enter it into regression. If neither of the above conditions 
is true, the analysis is ended. 

Entrance of a variable into regression involves partialing out 
from the residual matrix of covariances the effects of the variable 
being entered. Deletion of a variable from regression involves re- 
entering its effects into these covariances. Thus, the cycle is re- 
started, and the program iterates on this cycle until the decision to 
end mentioned above is reached. 

During each cycle, there are many bits of information easily 
available from the residual matrix. The program supplies the multi- 
ple correlation coefficient, the standard error of estimate, an F test 
for the significance of the multiple, the betas, their standard errors, 
the b-weights, their standard errors, bo, and an F test for the sig- 
nificance of each variable. 


Timing 
Table I shows the timing of each cycle, not including printout 
time, as a function of the number of variables in the matrix. 


Machine Requirements 


The program is designed to operate on an IBM 7070 with 10,000 
words of core storage and floating point arithmetic. Modification for 
a 5,000 word machine is simple, and would reduce the maximum 
number of variables to about 85. 


Input Format 


The format for input is exactly the same as that for the programs 
CORRI and CORR2, mentioned elsewhere in this journal. The pro- 
gram MRI is a deck which immediately follows the parameter card 
for CORRI or “LAST” card for CORR2 and operates upon the data 
in core storage that have been produced by the correlation program. 


1 


GARY LOTTO 05 
MRI р д 
Operating Times 
(Exclusive of output) t 
2 
^ 
7 
7 
Ку 
yr 
Т 
[7 
A 
i = 
* 
co , 
20 40 60 80 100 220 „МО 
NUMBER OF VARIABLES 
TABLE I 


Contents 

Variable number of the d.v. 2 

Floating point maximum F for deleting a variable from 
regression - Jj 

Floating point minimum F for including a variable into 
regression 


166 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Note that if columns 21-40 contain all zeros, all variables will be 
entered into regression, except those which are a linear combina- 
tion of variables already in regression. 


Test Problem 


A test problem is included to demonstrate output formats. The 
data are assumed to be those presented for the writeup of CORRI ` 
and CORR2, with the parameter card modified appropriately. The 
parameters used are variable 1 for the d.v., and 2.00 as the critical 
F values for inclusion and deletion. 


TEST PROBLEM—MR 1 OUTPUT 


MULTR SE EST DF F i 
-0000 24.6308 24 .00 | 
VAR BETA SE BETA B SE B F 
0 58.5200 
VARIABLE 2 IN CLUDED 
MULTR SE EST DF F 
-5753 20.5803 23 11.38 
VAR BETA SE BETA B SE B 
2 ‚5758 ‚1706 .4058 - 1203 
0 35.9226 
VARIABLE 8 INCLUDED 
MULTR SE EST DF F 
-7194 17.8700 22 11.80 
VAR BETA SE BETA B SE B 
2 .5879 .1482 .4148 .1045 
8 .4321 .1482 .9761 .1290 
0 22.2599 
VARIABLE 5 INCLUDED 
MULT R SE EST DF F 
-7649 16,9612 21 9.87 
VAR BETA 8E BETA B SE B 
2 .6227 ‚1419 .4393 .1001 
5 — .2635 .1425 —.2125 .1149 
8 4585 1413 .3992 .1230 
0 28.0103 
VARIABLE 15 IN CLUDED 
MULT R SE EST DF F 
.7989 16.2293 20 8.82 
VAR BETA SE BETA B SE B 


2 .6 1414 .4870 .0997 
5 —.3013 .1381 —.2430 .1114 
8 .5185 -1397 .4514 .1216 
15 —.2477 .1446 —.2291 .1387 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vot. XXII, No. 1, 1962 


FORTY VARIABLES PHI COEFFICIENT 
CORRELATION AND CHI-SQUARE PROGRAM 
FOR THE EXPANDED IBM 650 


А. W. BENDIG 
University of Pittsburgh 


Purpose 


Tus program computes phi correlation coefficients among n 
variables where n is equal to or less than 40, The output gives the 
size of the sample, the number and proportion of cases falling in 
one category of each of the dichotomous variables, a chi-square test 
of the significance of the relationship between each pair of variables, 
and the phi coefficient, in either fixed or floating point arithmetic, 
between each pair. Data cannot be missing on any of the variables: 
if data are missing the case must be omitted. The variables must 
be dichotomous with each X;, where i = 1,2....n (the number 
of Variables), scored either as plus one or as zero. 


Machine Requirements 


IBM 650 with indexing accumulators, automatic floating point 


arithmetic, and immediate access storage. 


Input Format 


DATA CARDS. Each of the variables is punched as zero (0) or 
900 (1) in columns 1-40 with one data card per case. If n is less 
\ ап 40, the remaining columns (columns т--1 to 40) must be filled 
In. with Numerical punches, preferably zeros. Columns 41-80 may 

© Used for identification or left blank, No overpunches are needed. 


Sms 1 23 3... 4 RJ DRE 
Entry X E X, Ident. or blank. 


* ED 
Te X, is either one or zero. 


107 


168 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


PARAMETER CARD. This eard specifies the number of vari- 
ables (n) appearing on each data card. 


Columns 1 ОЕА Б ПИШЕТ! 8 9 10 
Eniry 0 07 ле үе). 0 0 0 0 n n 


Overpunches (Y or 12) must appear in columns 3 and 10. 

PUNCH CARD. The last data card should be followed by a card 
transferring control to location 0250, the start of the computing and 
punching routine. 


Columns 1 2 3 + 5 6 7 8 
Entry 0 зр 4.0 0 0 0 


Overpunches (Y or 12) in columns 1 and 10. 

CONTINUE CARD. It may be desirable to effect an inter- 
mediate punch-out at some point before the remaining data cards 
are read and have the data read-in continue after the punch-out. 
A Punch Card should be placed in the data deck after the last data 
card to be included in the intermediate punch-out and the Punch 
Card be followed by a card transferring control to location 0078: 


Cn MEME NIMES S 7 s о 10 
Lr CECI у 0. 0: 7.8 


Overpunches (Y or 12) in columns 1 and 10. After this card is read, 
the next set of data cards will be read and summed with the data 
read from the preceding data deck or decks. 


Card Order. The several types of cards should be placed in the 
533 read hopper in the following order: 


Program deck 
Parameter card 
Data deck (sample A) 
Punch card (punch-out sample A) 
Continue card 
Data deck (sample B) 
Punch card (punch-out combined samples A and B) 
Continue card 
Data deck (sample C) 
Ete. 


A. W. BENDIG 169 
Output Format 


After transferring control to location 0250, the program will punch 
the following statistics in 8-words-per-card format with one card or 
line per pair of variables: 


Word Format Description 
1 00 0022 00jj Identification of variables 
2 blank 
3 00 0000 0NNN Total number of cases 
E 00 0000 ОХХХ Number of cases scored one on vari- 
able 2 
5 00 0000.pppp Proportion of cases scored one 
6 00 0000 OY YY Number of cases scored one on both 
variables 4 and j 
T 00 Оллд.хлхлх Chi-square (significance of phi) 
8 00 0000.rrrr Phi coefficient between variable i 
апаў 


The output phi coefficients can be either in fixed or floating-point 
arithmetic by an appropriate setting on the 650 console: fixed, 70 
- 1651 9999; floating, 70 1651 9989. 


Operating Instructions 

Place the program deck, parameter card, data deck, and иш 
tard in the 533 read hopper. The 650 console settings are: 

Storage Entry: 70 1651 99X9 (see above) 

Programmed: Run 

Half Cycle: Run 

Address Selection: Immaterial 

Control: Run 

Display: Program Register 

Overflow: Stop 

Error: Stop 

Board: University II 
] 


E" Computer Reset, and Program Start on the on юн ШЫ 
* start button on the 533. 


Programmed Stops 
None. A missing punch in columns 1-40 on a data card will result 


170 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in a nonprogrammed stop with 67 165X. 0163 showing in the console 
display. After correcting the card, the program may be re-started 
by manually transfering to location 0006. 


Time Estimate 


Time per case depends upon the proportion of variables scored 
“one” on each card: fewer “ones” and more “zeros” speed up the 


data processing. For cases with about half of the variables scored 
“one,” some representative times are: 


Number of variables (n) Seconds per case 


10 1.1 
20 2.6 
80 4.7 
40 7.4 


анун а 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


AN IBM 650 PROGRAM FOR ITEM ANALYSIS 
OF DICHOTOMIZED VARIABLES 


SARAH GABRIEL SAUNDERS 
L. WESLEY GADDIS 


Aerojet General Corporation, Azusa 
AND 
WILLIAM B. MICHAEL 


University of Southern California 


Ix item analysis procedures involving fourfold tables that repre- 
sent the four possible combinations of classification on two dichoto- 
mized variables, or dimensions, it is customary for one dimension 
(say К) to constitute the existence of high or low groups on а 
criterion variable, or the presence of two contrasting clinical or oc- 
cupational groups, and for the other dimension (say j) to designate 
the contrasting modes of response such as "right or wrong," "yes 
9r no," "agree or disagree," or "like or dislike.” Once such а table 
has been formed, as shown in Figure 1, the investigator is frequently 
interested in (1) the calculation of chi square (x?) to indicate the 
existence of a significant degree of association between category of 
criterion membership and mode of response, (2) the computation 
of the phi coefficient (ф) expressing the degree of relationship be- 
tween the two variables, (3) the determination of the marginal 
values of the fourfold table in terms of frequencies of responses, 
(4) the conversion of marginal values to proportions of the total 
frequency, and (5) the transformation of frequencies in each of the 
cells of the fourfold table to the proportions that they represent of 
the sum of f requencies in the row or column containing the cell. The 
table could also be conceptualized to form a basis for obtaining 
Telative to variables j and k inter-item phi correlation coefficients 
(between items j and k) in that the dichotomized responses for item 


171 


172 | EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


k would constitute a dimension or variable to replace those obtained 
for the dimension described by the contrasted criterion groups. 


Dimension j (0, 1) 
Mode of Response 
[0 (No) +1 (Yes)] 


Dimension k (0, 1) 
Contrasted +1 
Criterion (High) 
Groups 


B+D A+C N=A+B+C+D 
Figure 1. Fourfold Table for Item Analysis 


Purpose. With respect to the entries as designated in Figure 1, it 
will be the purpose of the program for given values A, B, C, and D 
(which are enumerated from cards punched 0 and 1 with respect to 
the response variable) to obtain 


мар - BC| - xy; 
(4 + BC+ D(A + (B + D) 


(2) =e 


(3) AD — BC (indicating direction of relationship in terms of 
plus or minus values) 


(4) the marginal frequencies, A + B, C + D, A + C, and 
° B+D 


(5) the marginal proportions as defined by 


(0) x= 


А+О B+D A+B D 
p= 20, q= Noo m= EP, ond a= SE" 


| 


SAUNDERS, GADDIS AND MICHAEL 173 


(6) the proportion of individuals in each cell of the fourfold 
table relative to the frequency of individuals in the row 
or column containing the cell as given by 


„14 ME БЕ 
Pa = TB Ри Seen 
B E 

ъ= AFB OE 
E E 
Pa = GED "BED 
= 08 Ч де 
өсер 


Program. The computer program, the routine for which is termed 
"Bill," is written for an IBM 650 2000-word drum digital computer. 
À Flicor deck is loaded with the program deck to allow for the use - 
of floating point notation. The input data must be in floating point 
Notation, Moreover, words six and seven must contain zeros. A plus 
Punch also needs to be placed in columns 10, 20, 30, 40, 50, 60, 70, 
and 80. In Figure 2 the input-output format is described. A flow 
diagram of the step-by-step procedure is presented in Figure 3. 


Program capacity and time of running. When the values n 
C, and D are in one digit, two digits, or three digits, respectively 
(representing three different data fields), the numbers of item vari- 
ables that can be handled per card are 77, 38, and 25, if three He 
umns are allowed for identification of the subjects. Running time is 
Approximately 20 minutes, 


MISCELLANEOUS INFORMATION 
Console Settings. 


Storage-Entry Switches: 70 1952 9999+ 

Switch Controls: Programmed—STOP 
Half-Cycle—RUN 
Address Selecture—1000 
Control—RUN 
Display—Distributor 
Overflow—sense 
Error—STOP 


174 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Input card: 


Output, cards: 


Figure 2, Input/Output Format 


Loading Order. 
Data Cards 
533 Hopper: (N, A, B, C, D,—ID) 
Assembly Deck "Transfer Card 


Program Deck 


Flicor Deck 
Clear memory to zero card 


ка 


BILL 


START 
(Subroutine) 

BLR 

STS BUS LLD BILL FLICOR 


Read N,A,B,C,D 
Store into Punch 
Punch 


E тош 
Store in POO0l 
Compute Phi 
Compute A+C C cep 
es | [же ШЕШ 
Compute а, Compute py. 
Store in P0006 Store in POOL 
С В 

Compute {у Compute ту Compute TE Compute 9k 
4 D 


| SAUNDERS, GADDIS AND MICHAEL 175 


Compute Py 
Store in P0005 


оо BACK TO BILL 


Figure 3. Flow Di btaining Chi Squares, Phi Coefficients, and Pro- 
Portions of fear С of a Fourfold Table and in the Marginal 
Entries. 


Begin Operation. 
l. Press console computer reset 
2. Press program start 
Special Instructions 
l. 533 Read-Punch unit 
(a) Insert 80/80 control panel 
(b) Ready read feed with assembly deck 
(v) Ready punch feed with blanks 


ae ө ө ө ө ө ө ө ө }ҤЁї}Ё }- Ө ө _ —ЄӨ ө Ө Ө Ө Ө 


176 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Obtaining copies of program. Copies of the program deck can be 
obtained from the writers at a nominal charge of $2.50 to cover 


cost; of cards, reproduction, wrapping, and postage. Request should 
be made for IBM 650 Program 069. 


| 


l 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


FITTING THE NORMAL OGIVE ON THE IBM 650 


ELLIOT M. CRAMER 


Biometric Laboratory ‹ 
The George Washington University 


Introduction 


IN both psychophysical work and psychological testing, a fre- 
quent model is: 


1 pim 


p- [ N(m, o) dv where N(m, o) = 


a с У2т 


. and where р is а predicted proportion of favorable responses with x 


being an intensity level. 

This program obtains the maximum likelihood solution for this 
model. The procedure used is similar to the probit method which 
is common to biological work. Considerable flexibility has been 
built into the program so that many functions may be fit with 
minimal data handling and key punching. There may be un to two 
hundred levels of z, All computations are in floating point; and 
Magnetic core, floating point, and index registers are required. 


Method 
We may write: 
=n) /e athe 
c= [ “NO, D du = f NO, Ddu where 


1 
= -3 and b—7 
The Maximum likelihood equations for this problem are nonlinear! 


‘Cornfield, Jerome and Mantel, Nathan. “Some New Aspects of the Appli- 
Qon of Maximum Likelihood to the Calculation of the Dosage пло ET 
е” Journal of the American Statistical Association, XLV (1950), ` 


177 


178 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


but Newton's method may be applied to give two linear equations. 
An iterative procedure then yields estimates û and 6 from which 
we obtain estimates 7 and å. 

Put y = do + box, given initial estimates of a and b, and let k 
equal the number of favorable responses and n equal the total num- 
ber of responses at z. Define p as the probability corresponding to 
the normal deviate y, and z as the normal ordinate corresponding 
to y, and let 9 = 1—р. 


The equations to be solved are then: 
Aa DZ Q+ Ab У 2:0 = УЕ 
a >> 2Q+ Ab У 00 = Lok 


where the summations are over z and where 


= (0-0 :(- ) «€ 


р E E E 
p n 97 


and 


For those computations 


u = z/p and v = z/q are tabled for y(0.00, 0.01, 5.00) 
u and v are estimated beyond 5.00 as u = 0, = y + .9325/y. 


Given initial estimates ay and bo, the equations may be solved 
for improved estimates, a = a + Aa, b = bo + Ab. Repeating the 
process until Aa = Ab ~ 0 yields the desired estimates û and 6 and 
thus Ж and ¢. 


Initial estimates are obtained by least squares in the form: 


= Qty = 7 


EDT. X ni "cn a= y — 02. 
Standard errors are given by 


1/2 
a (459) where Z = (E QÈ 20) — (È 20) 


and 


1 


o= (yotim 90/2707)“ 


ELLIOT M. CRAMER 179 
Input Format 


The input cards consist of a control card followed by one or more 
sets of data cards. 


Control Card 


Word One—0b bece cece 
where c is the identification code on data cards and b is the 
seale for X measured from the right, e.g. for 110.25, b — 2 
Word Two—ab cdef g000 
indicates a series of binary decisions with 1 meaning yes and 
0 no. 
Equal spacing between X's. 
Equal number of observations at each Ху. 
Same X; as previous set of data. 
Same N; as previous set of data. 
Omit tape output for each iteration. 
Use a and b from control card as initial estimates. 
Omit sequence check. 
Word Three—hh iii jjj; 
where h is the maximum number of iterations to be attempted. 
(10 is suggested) 
tis the number of values of Xi. 
j is the number of observations at each X, if constant. 
Word Four—ee eee0 0000 
where e is the tolerance for convergence, 00100 is suggested. 
This is the maximum relative error in û and 6. 4 
Word Five—X, if the X; are equally spaced (put in floating 
point.) 3 À 
Word Six—spacing if the X; are equally spaced (floating point). 
Word Seven—a in floating point (initial estimate if desired). 
Word Eight—b in floating point (initial estimate if desired). 
Irrelevant Items Punched As Zero 


њо ро ср 


09 


values of 1) K,, the number of responses at a point; 2) Ni, the num- 
ber of observations at that point; and 3) X, the point at which е) 
та Observation is made. 

On all three cards, Word One is of the form ab bece cece where a 


0 Data Cards 
There are three types of data cards which have punched in them 
E 


180 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


is 1, 2, or 3 indicating the type of data card; b is the consecutive 
card number within a type; c is an identification common to a 
single set of data and is punched on the output cards. Words 2 
through 8 have data with either K, X, or N punched in the low 
order positions with the decimal point aligned; eight figures are 
permitted. 

For a single set of data, the cards are entered in sequence on 
card type preceded by the control card. 

Additional sets of data may follow after last card. Number 2 
and/or number 3 cards may be omitted if data points are equally 
spaced and/or the number of observations is the same at each 
point. If the control cards for several sets of data are the same 
except for code word, only the first control card need be inserted. 
This means that, for the special case where the N; are equal and 
there is equal spacing with the values of X; common to several sets 
of data, one control card with the several sets of number one cards 
is used. If this is not the case but the X, and N are the same for each 
set, enter the first set of data in the ordinary manner; for the sec- 
ond set, indicate on the control card to use the same N; and/or the 
same X, and follow this by all the number one cards for the re- 
maining sets of data. 


Output Format 


Word One is dd 00cc ccce, d being the number of iterations re- 
quired, while c is reproduced from the input. The next six words 
have respectively a, m, b, ø, om, o», in floating point. 

The tape output is the same except that om and оу are not com- 
puted and values for each iteration are given. On failure of con- 
vergence, Word One is negative for the final iteration. 

If equations are ill-determined, a card is punched with om and оь 
equal to 99 9999 9999. 


Timing 
The time required to fit one function is approximately 2 + .5Q 
seconds where Q is the number of intensity levels. 


Operating Instructions 


Load tape 8010 if required for output. 
Use standard 8-word board. 


ELLIOT M. CRAMER í І om 9 


а коше may follow. ЇЇ magnetie tape is used for 
e last card should be a transfer to 0300 to write a tape 
rewind the tape. 


0000 end of job. 

1 seqeuence error; reorder beginning with the card that 
be read and press start button, or set 8000 to minus and 
button to proceed ignoring sequence errors. : & 
Tape write error—press start to try again. › ЖЫШ 
Tape write error—press start to try again. 


BTAINING COMPONENTS ESSENTIAL TO A NUMBER 
OF STATISTICAL ANALYSES BY USE OF THE 
IBM ACCOUNTING MACHINE? 


MONROE M. LEFKOWITZ Ax» HAROLD E. GREENE 
Rip Van Winkle Foundation, Hudson, New York 


tical analyses which 


‘Tris often necessary to perform certain statis 
ator and yet do not 


too cumbersome for the standard desk calcul 
Warrant the use of an electronic computer. Moreover, an electronic 
“Computer, in many installations, is not standard equipment. Faced 
Т. the particular situation at the Rip Van Winkle Foundation of 
having an IBM 402 Accounting Machine but no computer, 4 pro- 
dure was developed whereby the components involved in a number 
"of routine statistical operations could be obtained. 
3 E pecifically N, 3X, 3X?, 3XY, ЗҮ, XY? are terms used in deter- 
mining the standard deviation, t, Р, and the Pearson r. These terms, 
thin certain limitations, can be readily obtained on the IBM 402 
Or 403 accounting machine when equipped with eight co-selectors, 
80 counter groups, and a special program feature. Multiplication by 
"petitive addition, permitted by the special program feature, is 
described in IBM Principles of Operation Bulletin 48 (revised), 
“Copyright 1952, and the wiring diagram is included in the bulletin. 
| To obtain the statistical components under consideration, the 
Control panel wiring deseribed in Bulletin 48 is modified according 


а 


This i ing the psy- 
haa iS Paper is part of a larger program of research concerning 
Bii à development of aggressive behavior and supp Д wt apie 
бааа tates Public Health Service. Support was obtain am eit 
com іа County Tuberculosis and Health ‘Association, Inc. Free use ; 
ing equipment was rendered by the Watson Scientific Computing 
atory at Columbia University in New York City- 


183 


184 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


to the accompanying wiring diagram presented in Figure A. Al- 
though this diagram involves the use of 88 type bars, the same 
results can be achieved with the standard 55 type bars. 


Control Panel Functioning 


With X and Y variables punched on the same card, the sequence 
of operations described in Figure A is as follows: 


1. To produce X?, set up a shift transfer from counter 8A to 
counter 8B. 

2. The card count yields N; counter 4B prints out on multiplier 
change and counter 4D accumulates total. 

3. Counters 6D and 4C accumulate totals to yield XX Y. 

4. Counters 6A and 6B accumulate totals to yield XX?. 

5. The XY is accumulated in counter 6C. 

6. N and XXY are printed out at change of multiplier time. 

7. When the multiplier is X then XX?, XXY, XY, and N are 
printed out at final total time; when the multiplier is Y then XY”, 
XXY, XX, and N are printed out at final total time. 


Card Handling and Machine Procedure 

1. Cards are sorted on the multiplier field which is also the com- 
paring field. 

2. With set up change switch 1 on, take two final total cycles and 
pass a blank card through the machine. Switch 1 is then turned off. 

3. The variable cards are run through the machine. 

4. Take a final total cycle. 

5. Resort on Y field which will now be the comparing field. 


6. Reverse the multiplier with the multiplicand on the control 
panel and repeat steps 2, 3, and 4. 


Limitations 
1. N cannot be greater than 9999. 
2. The X or Y variable cannot be greater than three digits. 
3. XX or XY cannot be greater than six digits, and therefore the 
magnitude of the X or Y variable is negatively related to the size of 
N. 


4. For one multiplier the sum of all identical three digit numbers 
cannot be greater than five digits. 


во, 
ipa зваў? э 


р 
ero 
SANIW WELD X ON Y аз 
9 0.0!0 00 о ое о о 
<i м но X 4$ OHV PNN YEN 


171253005 


INO? 1318002 


6 o > 
4012385 10114 


z 
ozono 


э > Ro >» nN 


EI 


58013313503 


nam 
o o о о 
жама ам den 


TY3UHWYHATY 1YWION — $ 
9000 


$1— DONIC бын 
о о о о е о о ө 


ss 


оо о 


== а 05 Mor Ds 
о о о о о о о о о о о о о о о о о 
ARINI INA тузун WHINY — 
оё о о о о о ° 


ижа NOLLDINES AMIE 


о 

51—910: омоэн $ 
° о o 

s 

о 

s 


ooo 0 0 о o © 

ATINI INIM Y INOZ INIT сас 

9 o о o'o © 
л ont 
900200 

Азїмз INOZ. INI? ONE 


o 
H 
H 
$ 
e о 
$ 
e 


5 


$10121135:02. 
o о o о о o 
AYANI INOZ жинү 


н 
2 
4 
з 
а 
2 
* 
v 


ooo o o о о 
ERE a p шз [I = 


о о о о о:о о о о о о 

дама iniaa эмт 251—6 

о о о о о о о о о о о о о о о о о о 
$ 


ot AYN} INOZ INIT 1S1 
о о о о о о о о о о о 
$ 


оо ооо о о 
oc ч их NODIS NOI 

оо 4 o a 9 a о © о о о оо ө ө 9 9 

ANIN? INOT minra 

e o o o o sige 
ا‎ 


186 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


5. For one multiplier XX? or XXY for any group of identieal three 
digit numbers cannot be greater than seven digits. 

6. The total 3X?, XXY, or XY? cannot each be greater than ten 
digits. 

When the values are obtained the final arithmetical computation 
is carried out on the desk calculator. For example, the Pearson r can 
be solved merely by substituting the appropriate values in formulae 
8.3 or 84 as stated in Guilford (1950). 

An approximation of the Tunning time on the accounting machine 
may be computed by determining the range of scores or response 
alternatives for each variable, summing the digits in each column 
and then adding the sums. This value is added to twice the number 
of cards and then multiplied by the speed of the accounting machine. 
For example, if we run 1000 cards on a series 50 machine and the 
range of scores for the X variable is 151, 152, 153, 154, and 155; the 
sums of the first, second, and third columns respectively are 5, 25, 
and 15. If the response alternatives for the Y variable are 0, 1, 2, 3, 
and 4 then the sum of this column is 10. Adding the sums of the 
columns of both variables to twice the card count and multiplying 
by machine speed of 1/50 (one card per 1/50 minute) yields the 
running time: (5 + 25 + 15 + 10 + 2000) (1/50) = 41.1 minutes. 

REFERENCES 
Guilford, J. P. Fundamental Statistics in Psychology and Educa- 


tion (Second Edition). New York: McGraw-Hill Book Com- 
pany, 1950. 


ONAL AND PsYCHOLOGICAL MEASUREMENT 


XXII, No. 1, 1962 


A METHOD FOR DERIVING “FLEXIBLE” 
SOCIOMATRICES FROM RESPONSE FORMS 
APPROPRIATE TO CHILDREN IN THE THIRD GRADE! 


LEOPOLD O. WALDER, HAROLD E. GREENE, 
AND DONNA D. LEFKOWITZ 


Rip Van Winkle Foundation, Hudson, New York 


-THIS report describes a system for converting peer-rating nomina- 
tions by children in the third grade into a useful punched-card 
format. The third grader seems to need response forms tailored to 
T immature reading, judgmental, and writing skills. However, 
“A response forms are largely unusable by the data analyst. 

For any single peer-rating item, eg., “Who always sits around 
ра" or “Who is a pest?" or “Who would you like to play with?,” 
end result, desired is a binary sociomatrix with marginal sums 
Percentages. Figure 1 depicts such a single-item matrix. The 
ments are binary, 1 for a choice and 0 for a nonchoice. Each row 
sents the choices made by a single judge; each column repre- 
the choices of a single object made by all judges. Marginal 
Mes for the rows are judge scores; marginal entries for the col- 
a ns are object scores. А row sum represents the numbers of choices 
уа Judge; a column sum, the number of judges choosing ап 
d t. A row percentage is calculated by dividing the number of 
“lees made by а judge by the number of objects available for him 


: This data processing system was developed as part, of the -— TB and 
by the Rip Van Winkle Foundation, the Columbia County, Health 
Association, Hudson Lions Club, and National ТШШН Watson 
t M-1726). Grateful acknowledgment is extended to the ern com- 
time орип Laboratory at Columbia University for making 
уа! 


187 


188 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


OBJECTS 
Boys Girls 


n 

ul 

[2 

8 
eo 

b^‏ لا 
о‏ 
a‏ 
> 

= ul 

о 

a 

= 

Е] 


OBJECT SCORES 


Figure 1. Schematic of a sociomatrix partitioned by sex of judge and sex 
of object, 


to choose? A column percentage is ealeulated by dividing the num- 
ber of judges choosing an object by the number of judges who could 
have made the choice. (An alternate denominator uses the number 
of choices made by the judges. This type of column percentage is 
also calculated by this system.) This matrix may be partitioned (as 
indicated in Figure 1) in terms of the sex of the judges and of the 
objects into four submatrices. Marginal sums and percentages may 


2 Sociometrie procedures vary in the number and kind of choices allowed. 


The system described here was developed to handle any number of choices 
ret ine from no choices to choosing everyone in his classroom except 
amsetJ. 


WALDER, GREENE AND LEFKOWITZ 189 


be calculated for each of the submatrices and also for the un- 
partitioned matrix. 

This method was designed to take care of specified kinds of re- 
sponse forms. Thus a short description of the response form is in 
order. À more complete description is reported elsewhere (Walder, 
et al., 1961). Each child is given a peer-rating booklet made up of 
a number of identical pages. Each page contains the names of the 
children in the classroom arranged in two lists, one with the boys’ 
names and the other with the girls’. To the left of each name is a 
two-digit number. The topmost boy's name has 01 to the left; the 
next has 02; the next, 03; and so on to the high-boy number for 
the bottom boy's name. Similarly, the first girl's number is 31 and 
goes to the high-girl number. The children were instructed to mark 
names of as many of their classmates as fit each question, as read 
by the examiner. They were told not to mark their own name ex- 
cept on the first page, the question being “Who are you?” 

One page was used for each question. The page number, written 
at the bottom of each page, is the item number. A peer-rating ses- 
sion with 25 questions would require a booklet with 26 pages, page 
01 being for the identification item and page 02 for the first peer- 
Tating item. The peer-rating session with K items thus yields a 
booklet of (K + 1) pages for each judge present. 


Row Vector Cards 


One card is punched for each page (except the first). Thus a 26- 
Page booklet used to collect. peer-rating data on 25 items would 
Tesult in 25 cards being key punched. The total number of cards 
lor 10 judges in such a case would be 250 key-punched cards. Identi- 
fication and control information, plus up to 35 two-digit choices, 
‘re key punched. (Out of about 20,000 response sheets marked by 
about 900 children in 38 classrooms, only one sheet contained more 
than 35 choices and thus could not fit onto this kind of a card.) 

Cards are now ready to be processed by the Sociomatrix Program 
^ а basic IBM 650. Each input card results in one output card, 
“Шей a JUDGE card. Each JUDGE card contains one row of а 
ne-zero choice matrix plus row sums and percentages both for the 
partitioned and unpartitioned vector. A number of inconsistencies 

"errors in the input card are signaled in the output card. Such ; 
ings аз certain kinds of key-punching errors and eliminated self- 


199 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


choices are included in the errors signaled. The length of the row 
(or choice field) is determined by high-boy and high-girl values. 
These values also contribute to the denominators of the row per- 
centages. By program option, different high-boy and high-girl values 
from those already on the input card can be inserted from the 
console, 

In effect, the choices in the input card result in tallies of 1 in the 
columns of the output card corresponding to the two-digit designa- 
tion of the objects chosen. This is true for choices (columns) from 
01 through the high-boy value and from 31 through the high-girl 


value but excluding a self-choice (a two-digit choice equal in value 
to the judge’s own number). 


Object Score Cards 


Sociomatrices may be formed quite flexibly by selecting the 
JUDGE cards which will constitute the rows. Once selected, the 
JUDGE cards serve as source cards to derive column sums and 
percentages. This is done in two phases. First, condensed object cards 
are built for each of four submatrices with an accounting machine 
and summary punch. Each of these cards, to be used as input to 
the 650, contains two-digit sums for up to 26 matrix columns plus 
the number and kind of JUDGE cards used to generate that card. 
One such condensed object card handles th 
of the four submatrices. 


These cards are then processed on the basic 650 by the Object 
Scoring Program, which generates for each matrix column an OB- 
JECT card which contains the sums and percentages for the parti- 
tioned and unpartitioned column vectors. 

Thus a rather complete statement of the matrix can be generated 
by this system. In addition to certain error signals (as well as other 
information too detailed to present here) ? the following output is 
available: (1) the binary elements of the choice matrix in row 
vectors, (2) the row sums and percentages for the partitioned and 


e column sums for each 


3A more detailed description of the Processing system with explicit state- 


ments about card format, the three rogram listi iomatrix, Object 
Booring, and Item Adder)” hec bend ram listings (Sociomatrix, | 


Document number 6928 with the ADI ili 
duplication Service, Library of Congre: 
be secured by remitting $2.50 for 35 m 


Б.а. Oa س‎ Б 


WALDER, GREENE AND LEFKOWITZ 191 


unpartitioned matrix, and (3) the column sums and percentages 
for the partitioned and unpartitioned matrix. 

Finally, we have available an Item Adder Program for the basic 
650 which will sum OBJECT cards and recompute the percentages 
to yield composite object scores. 


REFERENCES 


Walder, L. O., Abelson, R. P., Eron, L. D., Banta, T. J., and Lau- 
licht, J. H. “Development of a Peer-Rating Measure of Aggres- 
sion.” Psychological Reports, IX (1961), 497-556. (Monograph 
Supplement 4-V9) 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vot. XXII, No. 1, 1962 


COMPUTER PROGRAM FOR THE 
MEEHL-DAHLSTROM MMPI PROFILE RULES. 


BENJAMIN KLEINMUNTZ 


AND 
L. BARTON ALEXANDER 
Carnegie Institute of Technology 


Мвенг and Dahlstrom (1960) have derived a set of objective 
decision rules as an aid in the discrimination between “neurotic” 
and “psychotic” MMPI profiles. In applying the Meehl-Dahlstrom 
tonfigural rules to some available MMPI records of neurotic and 
| Psychotic hospital patients, the authors found them useful but ex- 
tremely time-consuming. Consequently, we programmed the de- 
“sion rules for the IBM 650 model electronic computer. Whereas 
Previously it required from 5 to 20 minutes of clerical time for per- 
forming the MMPI profile analysis, the programmed rules permit 
US to process a case in about 3 to 10 seconds. The actual processing 
time, of course, depends on where in the rules a classification is made 
‘nd upon the speed of the computer used. It is the purpose of this 
Paper to describe the computer program which was devised for ap- 
plying the Meehl-Dahlstrom rules to large samples of patients. — 

The Carnegie Institute of Technology GATE (General Algebraic 
t Translator Extended) System (Perlis, Van Zoeren & Evans, 1959) 
I5 an algebraic coding system which facilitates the writing of com- 
Puter programs by converting the source program, which is Written 
1 mathematical language with a restricted set of symbols, into the 
Anguage of the particular computer to be used. The present и 
ащ of 180 GATE statements (cards) was translated and compiled 
7° 328 machine language statements. The program follows con- 

tively the 16 rules comprising the Meehl-Dahlstrom evalua- 
Bri Teaching either & classification or & proceed-to-the-next-rule 
decision, 


The input data of the program consist of eleven raw scores 


193 


194 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(K-corrected where necessary), a profile identification number, вех | 
identification, and a clinical classification. The output consists of 
each profile identification number, the clinical and program diag- 
nostic evaluations, a total count of the number of cases classified, 
and the number of cases where the clinical and the program classifi- 
cations matched correctly (e.g., the number of “hits” achieved by 
the program). A trace of the program using conditional type state- | 
ments ean be included so that the particular rule upon which the 
classification was made may be identified. 

There are four classifications which can be achieved by the pro- 
gram: 1) Rules not applicable, 2) neurotic, 3) psychotic, and 4) in- 
determinate. These are listed in Table 1 with their corresponding 
program symbol and values. For example, 70 = 10 and Z1 = 10 
are program and clinical classifications, respectively, of the neurotic 
diagnostic category. 

Presented in Table 2 are the MMPI scales with their correspond- 
ing program symbols for MMPI raw scores, calculated T'-scores, 
and position of the scales in the Hathaway code (Hathaway, 1947). 

The D symbols (D0 through D10) in Table 2 are fixed point 
(integer) variables which are assigned the K-corrected MMPI raw 
scores as input data. The X symbols are floating point variables | 
(decimals) which are assigned the calculated T-score values by the 
program. These calculations are derived from linear relationships 
between raw and T-scores and are carried out by a segment of the: 
program (statements 2, 3, and 164-178). Symbols J3 through J10 
are assigned values by statements 155 through 159, and correspond. 
to relative scale elevations in the Hathaway code (e.g., an ordering) 
of the clinical scales from the highest to the T-score values). For) 
example, J5 = 1 means that the MMPI D scale has the highest) 
elevation or that it appears first in the Hathaway code. 

In Table 3 the complete program is presented and the parallel 


TABLE 1 
Symbols and Values for Program and Clinical Classifications 


Representative Values 


Rules Not 


Classification Symbol Applicable Neurotic Psychotic — Indeterminat? 


Program Z0 0 10 20 30 
Clinical 21 0 10 20 30 


KLEINMUNTZ AND ALEXANDER 195 


MMPI Input Calculated Position in 
Scale Raw Score T-Score Hathaway Code 
L ро X0 

F D1 X1 

? D2 X2 

Pt D3 X3 J3 

Se D4 X4 J4 

D D5 X5 J5 

Hs D6 X6 J6 

Hy Di X7 JT 

Pa D8 X8 J8 

Pd D9 X9 J9 

Ma D10 X10 J10 


Meehl-Dahlstrom rules are notated to the right of the statements. 
Familiarity with the following definitions of operations for the 


TABLE 2 
Representative Program Symbols for MMPI Scales 
GATE statements may be helpful in understanding the program: 
| 


1. Statement numbers (first column) ranging from 1 to 179 are 
reference points and are used by the command statements. 
2. Algebraic symbols, such as 4, —, =, /, * represent addition, 
subtraction, equality, division and multiplication, respectively. 
__ €. Conditional test symbols, Q, R, U, V and W test for less than, 
[ less than or equal to, equal to, greater than, and equal to or greater 
than relationships between the symbols or numbers involved. 
4. Conditional statements (IF): The operation to the left of the 
F is performed only if all criteria to the right are satisfied. Multi- 
Ple IF statements and multiple operations (separated by an os 
_ Performed from right to left. 
5. Examples of control statements are noted below: 
a. 159, 16, 3, 1, 10, and the iteration statement 155 cause 
execution of all statements through 159, first with the variable 
6 equal to 3, then incremented by one on each recursion, until 
16 = 10, Transfer following the recursion with I6 = 10 is to the 
Next statement, 160. 
b. Go to 33 IF Y5 Q 7, as in statement 25, causes transfer 
{о statement 33 if the value of the variable Y5 is less than 7. If 
із Not, transfer is to the following statement, 26. 
с. Go to 154 М ҮІ = 1. IF YO R — 31, as in statement 12, 
Causes the value of Y1 to be set at 1, then transfer to statement 


196 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 3 
GATE Statements for the Meehl-Dahlstrom Rules 


Load and Go 

350 Used in Subroutines 

180 Is Highest Statement Number 

Опов D(10) I(8) J(10) pe 200) Ү(9) 2(3) 


1 10, ро, -+> D10, Z1, Z2, 23 СМ RE 

2 X2 = 0.3 * D2 + 41.00 

3 X0 = 3. 3333333 * DO + 36.666666 

4 GO TO 164 d 

5 GO TO 9 IF X0 Q 70. IF X1 Q 80. IF X2 060. Rules Applicable test. 
6 Z0 =0. 

7 TIOTZO 

8 GOTOI1 

9 Yö5=0.M 12 -0M I320M I4 =0M I5 =0 

10 Y6 20.M Y8 20. M Y9 = 0. К 
11 Y0 = (X3 + X4) — (X5 + X6) Beta Calculation 
12 GO TO 154 M Y1 = 1. IF YOR — 31. Band Locations 


13 GO TO 154 M Y1 = 2. IF YOR — 11. 
14 GO TO 154 M Y1 = 3. IF YO R 6. 
15 GO TO 154 M Y1 = 4. IF YOR 25. 
16 GO TO 154 M Y1 = 5. 
17 18, I1, 0, 1, 10, 
18 [2 =1IF XI1W 70. 
19 89 TO221F 1201 
20 prt + X5 + X7)/3. + (X5 + X3) - 

(Хе + X7) Anxiety Index 
23. Yá e t X5 + Х3)/(Х7 + X9 + X10) Internalization Ratio 
22 24, 11, 8, 1, 10, 
28 Ү5 = Y5 1 .IF ХІІ W 80. 
24 Y6 = Y6 + 1. IF X11 W 90. 
25 GO TO 33 IF Y5 5Q7. Elevation Rule 
26 GO TO33IF Y6 Q 5. 

1IFY2 R — 15. $ 

28 GO TO 161 M Z0 = 20. Classify Psychotic (P) 
1 T22 T23 


31 Z0 = 30. Classify Indeter- 


minate (I) 
32 GO TO 161 
33 GO TO35IF J10U 1 Manic Rule 
34 GO TO 45 
35 GO TO 39 IF X5 Q 50. 
О a IF X6 Q 50. 


rr IF (X5 + X3) W 115. 
40 GO TO 45 IF (X10 — 15.) Q X5 
41 GO TO 45 IF (X10 — 15.) Q X6 
42 GO TO 45 IF (X10 — 15.) Q X7 


43 Z0 = 20. Classify (P) 
44 GOTO 161 


45 46, I1, 3, 1, 10, Normal Profile Rule 
46 I3 = 11F XI W 70. 


TABLE 3 (Continued) 
GATE Statements for the Meehl-Dahlstrom Rules 


TO 70 IF 13 U 1 
T1, 0, 1, 10, 
{IF XI V 55. 


TO 31 IF 74 U 0 Classify (1) 
TO 54 IF Y4 V .90 j 

TO 31 IF Y2 R 0. Classify (I) 

TO 28 Classify (P) 
TO 60 IF Y2 V — 10 4 

TO 31 IF J9 U 1 Classify (I) 

TO 31 IF J8 U1 Classify (D) 

TO 31 IF J4 U1 Classify (D — 

= 10 Classify Neurotic (N) 

30 TO 161 
TO 63 IF X8 W X9 
TO 65 M Y8 = X9 IF X9 W X10 

10 TO 65 M Y8 = X10 
Es Y8 = X8 IF X8 W X10 
= X10 
TO 65 IF X5 W Y8 

69 IF X7 W Y8 
TO 28 Classify (P) 
TO 58 IF X7 W Y8 Classify (N) 
TO 31 Classify (I) 

O TO 74 IF X0 Q 60. Fake Good Rule 

IO TO 58 IF Y1U 1. Classity (N) 
TO 31 IF Y1 U 2. Classify (1) 

О TO 28 Classify (P) vu 
TO 81 IF 72 U 0 Psychotic Code Rule 
= 0. 
= Y9--1.1FJ9 R3 
= Y9 --1.IF J8 R3 
115783 
= Y9 + 1. IF J10 R3 ү 
TO 28 IF Y9 W 3. Classify (P) 
TO 83 IF X8 W 70. Slope Rule 
TO 89 IF X4 Q 70. 

EO S5 IF x8 W X5 IF X8 W X6 IE ХВАТ 
9 
HOST IF X3 W X5IF X3 W X6 IE X3 W XT 
89 | 

О TO 28 IF X4 W X5 IF ХА W X6 IF X4 W X7 Classify (P) 
TO 89 > 

j TO 94 Ir Jo v 1 4' Rule 

94 IF X9 Q 70. А 

( TO 28 IF Y1 W 4. IF Y2 W 0. Por А) 

I 0. Classif, 

OTO à F Y1 R2. IF Y2Q Classify (D 
TO 100 Ir J8 V1 6' Rule 

IF X8 Q 70. 
то 28 IF X8 W 80. Classify © 

IF Y1U 1. Classify 

TO 31 IF Y2 R 20. IF (X8 — X3) R10. Classify (D 


KLEINMUNTZ AND ALEXANDER 197 


198 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 3 (Continued) 
GATE Statements for the Meehl-Dahlstrom Rules 


15 = I5 + 1IF XA W X3 
GO TO 142 IF J10 U 1 
GO TO 143 IF J4V1 

15 = I5 +1 

GO TO 145 IF X4 W X5 
GO TO 146 IF X9 Q X5 


15 = 15 4-1 

GO TO 28 IF I5 U 3 Classify (P) 
GO TO 58 IF 15 U 0 Classify (N) 
GO TO 31 Classify (I) 
GO TO 132 IF X5 Q 75. Band 5 Rule 


GO TO 28 К 
GO TO 102 IF JA4U 1 IF /9 2 2 8'4 Rule 
GO TO 104 

GO TO 104 IF X4 Q 70. 

GO TO 28 IF (X4 — 10.) W X3 Classify (P) 
GO TO 107 IF X4 Q 80. Sc-Pt Rule 
GO TO 28 IF (X4 — 10.) W X3 IF Y2 V —60. Classify (P) 
GO TO 28 IF ХЗ R X8 IF X3 R X4 IF X8 W 70. Classify (P) 
GO TO 109 IF X8 Q 70. Pa-Pt Rule 
GO TO 28 IF (X8 — 10.) W X3 IF Y1 V 1. Classify (P) 
GO TO 115 IF Y1V 1. Band 1 Rule 
GO TO 31 IF X1 W 70. IF X9 W 70. Classify (I) 
GO TO 31 IF X9 W 65. IF X8 R 45. Classify (I) 
GO TO 31 IF Y2 W 0. Classify (I) 
GO TO 31 IF X5 W 100. IF (X5 — X10) W 60. Classify (I) 
GO TO 58 Classify (N) 
GO TO 119 IF Y1 V 2. Band 2 Rule 
GO TO 31 IF X5 W 100. IF (X5 — X10) W 60. Classify (I) 
GO TO 31 IF X8 W 75. Classify (I) 
GO TO 58 Classify (N) 
GO TO 138 IF Y1 V3. Band 3 Rule 
GO TO 130 IF X5 Q 85. 

GO TO 124 IF (X9 — 10.) Q X6 

GO TO 28 IF X9 W 70. Classify (P) 
GO TO 28 IF X8 W 70. Classify (P) 1 
GO TO 126 IF J5 U LIF J3 R2 

GO TO 31 Classify (I) 
GO TO 128 IF X4 Q 80. IF X6 Q 80. N 

IF X7 Q 80. IF X8 Q80. IF X9 Q 80. N 

IF X10 Q 80. 

GO TO 31 Classify (I) 
GO TO 58 IF X9 R 70. IF X8 R 70. Classify (N) 
GO TO 31 Classify (I) 
GO TO 31 IF X1 W 70. Classify (I) 
GO TO 31 IF J9 U 1 Classify (1) 
GO TO 31 IF J8 U1 Classify (I) 
GO TO 31 IF J4 U 1 Classify (I) 
GO TO 31 IF J10 U 1 Classify (I) 
GO TO 31 IF Y2 W —10. Classify (I) 
GO TO 58 Classify (N) 
GO TO 149 IF Y1 V 4. Band 4 Rule 


KLEINMUNTZ AND ALEXANDER 199 


TABLE 3 (Continued) 
GATE Statements for the Meehl-Dahlstrom Rules 


150 GO TO 28 IF X9 W 75. IF X9 W X7 


IF X4 W X3 Classify (P) 
151 GOTO 31 Classify (I) 
152 GO TO 58 IF (X3 — 10.) W X4 IF X4 R 80. N | 

IF X8 R 70. Classify (N) 
E GO TO 31 Cu (I) 
154 Y2 = (X9 + X8) — (X6 + X7) 
155 159, I6, 3, 1, 10, Hathaway Code 
156 18 =1 


157 158, 17, 3, 1, 10, 
18 18 = I8 + 1 IF XI6 Q XI7 


159 JIG = I8 
160 GOTO 17 ч 
11 Z2 = Z2 + 1. IF Z0 U Z1 “Hit” Count 
X2 23 = 73 +1. Classification Count 
163 GO TO 29 
164 X1 = 2.3076923"D1 + 43.076923 F Scale 
165 X6 = 2.50*D6 + 22.50 Hs Scale 
100 X5 = 2.50*D5 + 7.50 D Sale 
167 X7 = L8181818*D7 + 20.0 Hy Scale 
168 X9 = 2.3809521*D9 + 4.7619040 Pd Scale 
160 X8 = 2.9166007*D8 + 27.083335 Pa Scale 
170 X3 = 2.0*D3 + 4.0 Pt Scale 
Vl ХА = 1.9354839*04 4- 6.7741923 Se Seale 
172 X10 = 2.50*D10 + 7.50 Ma Scale 
ИЗ GOTO SIF IU 0 
14 X6 = 20*Do + 24.0 IF ILU 1 Hs Scale 
WS X5 = 19144441*D5 + 12.50 D 
is X7 = 1.7047059*D7 + 17.058823 bar 
10 ХЗ = 1.6333333*03 + 9.1000009 Pt Scale 
8 X4 = 1.5333333*D4 4- 15.000009 Se Scale 
179 GO TO 5 


PROGRAM END 


154 if the value of the variable YO is less than or equal to n 
Otherwise, Y1is unchanged, and transfer is to the following state- 
ment, 13, 


REFERENCES 


Hathaway, S, R, “A Coding System for MMPI Profiles.” Journal 


9f Consulting Psychology, XI (1947), 334-337. 
Мег, P. E. and Dahlstrom, W. G. "Objective Con РАА 
рона. Psychotic from Neurotio 5-387. 
< 0f Consulting P. l XX T а 
E. А. 5 Van وی‎ Evans, A., Jr. GATE: Algebraic 
ompiler with Segmenting and Library F варите те of Tech- 
Dpublished mimeographed manual, Carnegie I 


nology Computation Center, 1959. 


AND PSYCHOLOGICAL MEASUREMENT 
No. 1, 1962 


4 


COMPARISON OF SOME COMPUTER TECHNIQUES 
А FOR FACTOR ANALYTIC ROTATION! 2.8 


К. К. EYMAN, Н. Е. DINGMAN, ax» С. Е. MEYERS* 
Pacific State Hospital 


Introduction 


C rotations of factor analytie solutions usually involve 
sive amounts of time of sophisticated personnel. Scientifically, 
› hand rotations have been criticized as being subjective so that 
ely determined factors are impossible (Kaiser, 1958). Analytic 
es have been proposed which ean be performed on electronic 
ters and purportedly offer unique nonsubjective solutions. 
er, the relative value of the current analytic solutions is still 
‘Questioned (Wrigley, 1958; Cattell, 1960). 

purpose of this paper is to compare the results of a variety 
tional solutions representing both graphic and analytic meth- 
Tding two specific factor matrices. A number of previous 
8 (Marks, et al., 1960; Fruchter, 1958; Wrigley, 1958; Kaiser, 
have made similar comparisons revealing substantial differ- 
the graphic and analytic solutions. In contrast, agreement 
these solutions has also been demonstrated (Kaiser, 1958; 


dd i 3 ‚ No. 
orted in part by the National Institute of Mental Health Grant 1 
ia га Processing in Mental Deficiency. Pacific State Hospital 


the Western Data Processing 
of the Graduate School of Business Administration, Univer- 


P i ter revised Carroll’s 
Н. Geertsma at the UCLA Medical Сеп e ЕШ 


the University of Southern California. 
201 


202 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 7 


Cattell, 1958; Kaiser, 1959; Carroll, 1957; Pinzka-Saunders, 195 
The specific data used in this study necessarily limit the genera 
of the findings. However, additional comparative data regarding | 
alternative methods of rotation, many of which are currently Pros 
grammed for computers, may be of interest. 


Procedure 


As part of a research program at Pacific State Hospital (a Call _ 
fornia state institution for the mentally retarded) tests were devel- 
oped and administered to 100 institutionalized mentally retarded 
and 100 “normal” children in a neighboring community, all with 
mental ages (MA’s) of approximately six years. These tests were 
constructed to be relatively factor pure and to measure factors well- 
identified in adults and college students. The tests were intercorre 
lated with an IBM 709 program for computing Pearson product- 
moment correlations, giving two intercorrelation matrices (one for 
the normals and one for the retarded). Tests 1, 2, and 3 were tests 
of psychomotor ability; tests 4, 5, and 6 were of perceptual speed; 
tests 7, 8, and 9 were of psycho-linguistie ability tests; 10, 11, ап 
12 were of figural reasoning. Test 13 was a test of immediate audi- 
tory memory span and was employed to give the examiner some 
indication of how well the patients and normals were attending 
the testing tasks. Thus, there was every reason for expecting а clear 
identification of factors. Based on literature regarding adults, col- 
lege students, and Air Force cadets, orthogonal structure and posi 
tive manifold would be expected (French, 1951; Guilford, 1956b) 
The details of the testing program and the other tests used are pre- 
sented elsewhere (Meyers, et al., in press). 


Results 


An IBM 650 program was employed for performing the arbitrary 
factor extraction by the centroid method for these data. The com- 
munalities were handled by automatically choosing the largest e 
ment in each column. No iterating was done. The centroid solutions 
for these data are presented in Table 1. н 

A manual rotation was attempted with the Zimmerman (1946) 
graphic method. After many attempts, it was apparent that 
orthogonal manual solution was not possible. Rotations were d 
done by the Thurstone (1947) method of direct rotation of rele 


EYMAN, DINGMAN AND MEYERS 


9c$ eez— TOT 281 98v сіс 080 022 090— 
190— $SIC— GIT 611 $62 969  960— с®0— 97- 
106— F60 She FIs £69 859 SIZ— 980— L- 
061— 880— 61 [441 Sy, сӯ LLI 921 TO 
SFG SFI 680— 97 179 $68 276— © oor 
106— LFI 278— 186 +89 €89 280 960— #0 
GET TPZ z6z— 6F ZZL FEL 602— 801— FT 
860— $26 £40 с60— 208 9/9 T6L TOS TIE 
780 ках Eit SFI— 898 Ie  *10— 881— I= 
аҳ sgt 966 TSI— $68 089 ZOT = > 
GFO0= ZIZ— 661— 897— 809 TIA 090— 020— 222 
020 l> IR OFS VEL SIZ 2180— LIZ— 622 
TII—- FE 2»e— СӨР 888 вес 822 €6I— ISI 
A AI II II T W ER A AI 


II 


PAST 9 VIN 3€ рәрїїзәч 


T949T 9 VIN 39 [FUHON 


usdg зіс "£r 
Surádo)) шәуза oye “ZT 
әлү ` 


‘AA HA suoumry `8 
&iwnqeo0A әләт OPV °4 
poadg үепуЧәәләд oye °9 
sana psoruop] "$ 

Sago PUPI ^ 

Зоос epa ‘£ 


u24p]t:) pepsnjay OOT PUD PPULON 001 40] бизроот prou) 


I WISV.L 


204 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ence axes, to the criteria of simple structure and positive manifold. 
The results of this solution are presented in Table 2. 

Subsequently, various analytic methods of rotation were em- 
ployed, that furnished a basis for comparison with the manual 
graphie rotation. Two orthogonal and two oblique methods were 
used to explore all possibilities of analytic solutions. 


TABLE 2 
Thurstone Graphic Oblique Solution 
(Reference Structure Loadings V) 
Normal at MA 6 Level Retarded at MA 6 Level 


3 
Я 

+ 
> 
ы 


с р Е Е А B с р 


Е 
414 —108 —102 170 —073 —232 623 073 118 —007 —138 
581 029 098 —131 —120 218 528 126 013 —037 064 
495 037 —015 —214 100 266 602 084 —027 —012 —006 
—014 491 043 —040 001 032 —040 480 —013 —039 220 
182 —044 454 —032 032 208 

003 475 —004 000 050 —012 027 427 180 081 —076 
124 —135 493 189 —170 —002 —091 045 633 —012 —012 

227 042 459 —001 —090 —291 104 013 500 226 —319 

—071 —127 325 —137 212 255 —093 099 385 —096 190 

10 —059 —185 085 293 149 —280 061 094 008 364 —042 
11 019 —046 153 478 —147 151 —280 236 000 439 —053 
12 085 —140 —037 620 —058 142 091 018 —077 334 135 
13  —090 —006 —029 161 3/3 029 002 —147 —005 000 433 


© ооч Ф сњ оюн 
1 
+ 
= 
ы 
e 
a 
e 
© 
3 


Correlations Between Primary Factors 


Normal at MA 6 Level 


A B с р Е Е 
А Tis) 2 ena 04 — 850 
B 689 716 610 —327 
c 551 775 —9275 
D 5900 —399 
E —275 
Е 


Retarded at MA 6 Level 


A B © р Е 


А 506 619 668 647 
в 451 735 706 
С 519 553 
р 805 
Е 


EYMAN, DINGMAN AND MEYERS 205 


first analytie rotational solution attempted on the data in 
1 employed the “quartimax method" (Neuhaus and Wrigley, 
which determines an orthogonal transformation that will 
y the original factor matrix into a new factor matrix such that 
‘sum of the fourth powers of the rotated factor loadings is a 
laximum. An IBM 650 program was used employing the above 
hod. The results are presented in Table 3. 

û orthogonal normal varimax solution (Kaiser, 1958) was at- 
ted using an IBM 709 program. The varimax method, which 
odification of the quartimax method, places more emphasis 
plifying the columns, or factors of the factor matrix. The 
varimax criterion was employed, in which a rotation is con- 
with respect to normalized common-parts of the test. Table 
the results of the normal varimax solution. 

analytic oblique solutions were considered because of the 
structure achieved with the graphical solution and the large 
of high intercorrelations present in the original correlation 
An IBM 650 program for the Pinzka-Saunders' oblimax 
а (Pinzka-Saunders, 1954; Nickles and Keenan, 1960) for 
Totation to simple structure was utilized. This is a program 
fits an oblique simple structure, having K dimensions to a set 
vectors. The method used is a generalized version of the 
Пах method and is based upon maximizing a function of the 


TABLE 3 
Quartimax Solution 
Normal at MA 6 Level Retarded at MA 6 Level 
eT 
A B С А B 


138 600 003 002 005 778 —172 —157 —083 024 


$24 O54 014 088 074 025 838 124 
080 152—012 —210 002 607 —002 —014 —001 —273 


200 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 
Varimaz Solution 


Normal at MA 6 Level Retarded at MA 6 Level 


Test A B [еј р Е Е BIO D Е 


1 503 148 262 252 304 —243 780 096 202 088 023 
2 767 235 114 211 007 108 766 279 150 166 209 
3 779 217 —075 082 105 197 790 185 077 127 1% 
4 219 724 215 170 112 147 358 704 186 329 208 
5 213 672 208 361 008 236 363 689 202 406 290 
6 262 718 157 153 193 081 384 640 385 372 059 
7 138 156 617 512 078 203 121 214 827 189 275 
8 023 309 732 148 127 119 265 041 708 330 —002 
9 157 268 169 087 089 508 129 317 546 136 308 
10 024 090 392 268 433 —014 308 248 304 587 218 
11 095 198 252 702 127 084 006 379 324 694 162 
12 258 263 124 693 259 016 339 259 243 6577 397 


13 169 231 061 173 394 203 121 151 171 181 686 | 


factor loadings which in effect maximizes the “peakedness” of 
the distribution of the rotated loadings and their reflections among 
the factors. That is, the fewest possible loadings are to account 
for the major part of the variance on a factor. 

The program format provided for only an even number of test 
vectors so that one test was deleted from the original 13 tests used. 
“Dummy” loadings were not used since one of the tests in the 


original battery was included for control purposes and was not | 


judged as related conceptually or empirically with the other meas- 


ures. The results of the Pinzka-Saunders’ oblique solution are pre- | 


sented in Table 5. 

Finally, Carroll's biquartimin solution (1957, 1958) was per- 
formed as an alternative oblique method. An IBM 709 program 
based on the above solution was used for the analysis. This program 
had the advantage of being able to consider an odd number of test 
vectors for a factor. Given an arbitrary factor matrix, the object of 
this solution is to find a transformation matrix such that the ele- 
ments of the resulting product matrix will satisfy the biquartimin 
criterion. The biquartimin criterion represents a compromise of 8 
general class of oblimin methods which are concerned in minimizing 
ап inner-product function (the sum of cross-products of squa 
factor "loadings") of the columns of the final factor structure 
matrix. See Table 6. 


| EYMAN, DINGMAN AND MEYERS 207 


TABLE 5 


Pinzka-Saunders Oblimaz Solution 
(Reference Structure Loadings V) 


Normal at MA 6 Level Retarded at MA 6 Level 
Test А B Cc D E F A B с DUE 


232 035 069 047 138 —177 649 022 076 —077 —131 
601 —010 063 014 —126 —023 576 032 —021 —080 

675 002 —112 —042 012 144 638 017 —064 —070 —019 
—009 510 065 —020 —028 002 028 318 —074 —002 252 
005 407 032 151 —074 047 023 284 —085 067 234 
003 528 —009 —006 039 —009 069 828 106 094 —023 
035 —125 862 145 —012 046 —052 —007 655 —045 107 
—-091 100 514 —217 062 003 104 010 494 178 —244 
| 208 019 034 —021 092 387 —040 007 405 —108 273 
10 109—025 075 090 355 097 073 —015 O11 354 —045 

Ш -076 017 —006 482 —007 —014 —275 117 —009 475 —044 
2 014 055 -163 519 086—013 115 —128 —042 319 116 


| . D 
yO СЕЕН 


Correlations Between Primary Factors 
Normal at MA 6 Level 


Ф 00 ډه‎ С сл нь со во س‎ 


А 600 435 531 -11 468 
В 60 64 251 344 
g 807 212 515 
D 106 474 
E —443 
F 
Retarded at MA 6 Level 
| — = 
A B с р Е 
ee EEE 
A —125 186 31 254 
| В 270 42] 40 
C 563 40 
D 721 
E 
a И 


Discussion 
К ‘valuation of the foregoing solutions naturally has to be 
E" to common criteria. In this study, well-defined variables 
f * employed with the expectation that psychologically meaning- 
Mrücture could be identified in rotation. Four hypothesized 


208 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 6 


Biquartimin Solution 
(Reference Structure Loadings V) 


Normal at MA 6 Level 


Retarded at MA 6 Level 


Tes A КЕН Ce Orr AL в. O° DER 
1 408 034 137 106 126 —321 723 —020 090 —030 —052 
2 691 073 035 086 —112 —013 682 133 007 004 084 
з 712 065 —119 —069 039 089 730 065 —051 006 035 
4 071 598 086 034 —016 018 219 676 017 071 093 
5 059 521 054 230 —054 106 212 557 020 143 066 
6 107 594 023 015 055 —047 240 428 220 118 —136 
7 041 —018 472 360 —040 135 —026 036 738 —049 105 
8 —068 175 634 —018 009 058 136 075 599 150 —090 
9 085 123 120 005 101 438 007 154 461 —074 196 
10 074 —020 263 140 830 —041 162 143 141 402 16 
11 -—023 063 067 600 004 015 —160 310 163 496 04% 
12 110 117 —081 576 110 —072 190 124 075 403 204 
13 063 107 —043 077 346 149 036 006 101 077 498 
Correlations Between. Primary Factors 
Normal at MA 6 Level 
A B [e D E F 
A 417 229 341 339 142 
B 353 370 362 254 
С 465 368 013 
р 381 062 
Е —106 
F 
Retarded at MA 6 Level 
A B с р Е 
А 345 358 302 214 
B 436 327 495 
[o 431 286 
D 097 
E 


weer a i N Үү шс. .- o — РЕ 


factors were expected to emerge based on research by Guilford 
(1956) regarding “The Structure of Intellect.” A primary considera” 
tion, then, in the evaluation of these methods of rotation concern 
the interpretability and the meaningfulness of the solution in que 


tion, 


EYMAN, DINGMAN AND MEYERS 209 


le structure is usually necessary to a meaningful solution. 
lems of ambiguity embodied in the concept of simple 
are well recognized. They have been given considerable 
ition (Wrigley, 1958; Kaiser, 1958) with the development of 
eal methods of rotation which have been handled in terms 
С g the principles of simple structure to an objective form 
àn, 1960). Still a convenient alternative to simple strueture 
ently not available although it is recognized that a more 
ous and logical statement is needed. 
example of an alternative criterion, Kaiser (1958) presents 
sive discussion regarding simple structure and factorial in- 
ince. Although in this discussion the principle of simple struc- 
Conceived as incidental to the more fundamental concept of 
l invariance, it is stated that the two concepts are highly 
Hence, simple structure is still a commonly used standard 
omparison purposes. 
Comparison of the solutions from the discussed methods of 
on demonstrates that four of the five attempted solutions were 
tory regarding the identification of the hypothesized factors. 
е Thurstone graphic rotation, it was possible to identify the 
thesized factors for both the mentally retarded and normal 
S and а good approximation to simple structure was obtained. 
ver, the intercorrelations among the primary factors were 
ly high suggesting a bias in the direction of factor axes 
аге too highly related as a consequence of obtaining simple 
ure, 


? quartimax solution furnished predictable results in identify- 
"Eeneral factor. This type of solution appears to be common 
hees where the intercorrelations between the variables at 
fhigh as was the case in this study. In terms of the criteria 
d, this solution was unsatisfactory. ‹ 

Dormal varimax solution demonstrated that the hypothesized 
could be identified with the restriction of orthogonality. 
ble simple structure was obtained, though several of pr 
loadings tended to be high and all positive. This indicated 
? Solution was possible in terms of well-defined zero 
In the factor structure. ; 

Oblimax solution furnished questionable results regarding 
Hear identification of factors and limited correlations be- 


210 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tween the primary factors. In terms of the large amount of bias of 
the oblimax rotation toward highly correlated factor axes, the 
limited definity of the factor structure was very disappointing. 
Hence, this solution was judged as inferior to both the graphic and 
varimax solutions. 

The biquartimin solution achieved both an excellent approxima- 
tion to simple structure, identifying all of the hypothesized factors, 
and essentially avoided spurious intercorrelations between the pri- 
mary factors. Differences in factor loadings between the graphie 
and biquartimin solutions were confined to the data on the non- 
retarded group and were associated with test vectors 9 and 10. 
These differences were judged as trivial in the interpretation of the 
results. 

The findings reported are fairly consistent with comparisons 
found in Harman concerning analytic methods of rotation. The 
biquartimin rotation furnished a very superior oblique solution and 
the normal varimax rotation yielded credible results considering 
the restriction of orthogonality. At this point, an additional analy- 
sis was considered appropriate to compare the results of the normal 
varimax and biquartimin solutions in a situation where the data 
would most probably indicate orthogonal structure rather than 
oblique structure. Of particular interest would be the behavior of 
the biquartimin solution regarding possible bias toward correlated 
factors with no appreciable improvement in approximation to sim- 
ple structure. 

Data were borrowed from a study entitled “The Relation of the 
Degree of Mongolism to the Degree of Subnormality” (Kaariainen, 
in process) using 40 mongoloid children. Thirteen variables were 
intercorrelated using Pearson product-moment correlations and 
factors were extracted by the centroid method. The intercorrela- 
tions were reasonably low in absolute value. A varimax solution was 
attempted by the same routine as discussed previously. The results 
including the original factor matrix are presented in Table 7. In 
addition, the biquartimin solution discussed earlier was attempted 
and the results are presented in Table 8. 

It can be observed that the results from the two solutions ате 
almost identical. In both solutions an excellent approximation to 
simple structure was obtained. In addition, the results can easily be 
interpreted. Finally, the intercorrelations between the primary axes 


l 
| 


211 


EYMAN, DINGMAN AND MEYERS 


020— 890 020-— 600 5580 980— 266 690 — 960— Zes— 891 80 99 SIL &ygeuriou 
-qns jo әәл®әсү 
OLI— GET Ф0  99e— SII TOI 498 €02— 195— 095— 69 601— 089 16? вувә, 10100 
890 — I0I— AFI e10— 090  960— 258 TO OIL +60 SPI 990 eo 8 эвәд, "хоол AMPIA 
100  Sr0— S4I— SSI 670  S90I— 906 Ost 160 280— 960  c80 109 769 prvog wiog 
TOT 600  Sv$— 99 Fee FO 7/° PLE. IOC сет ISE UTE DE Co pivoqaoq 
668 800— 090— #0 480 980  8:0— #0 see cóc ОР 6б/с— 18Р— EEE qoatqns jo 
Tq 49 939 syge 
878 Өл 190 8I0— 690 861 GEO ою SII 81 oar 218- c92— osg— qoetqns jo 
Чэлч 78 939 s 19]0]q 
90 699 960— 900— $10 060  2c00— тт  .10v— EET  $80— 015— ScI— CI 3oo[qns јо әлү 
тт €80 #0 000 160 968 GOI TO #0 ғ 0/&— 80с— 0/9— 0 wed 939] jo upra 
0} чїйпә| jo uot 
480 | 8601 $81 07 998  $910— 2/0 990 | 00£— 002 120 099  $Ig— 962 wed 439] 
чо mpsiun jo ojuy 
6.1 90 881 ZIO  6G00— 906 SOI 220 $860 FSF Z8Z— F8C— S:9— 805 wed juu јо pis 
0} чуйпә jo uO HAY 
#80 | $95— GOI— с=с 786 101 220 L1£— 960  c80  c0c— 869 Ze- cy wed заш 
uo npn jo o[2uy 
0c0— 191— 8704 90 890 005 #00— 990  990— czre— 0028 8ҥ 10—  0Sz— шецойпош jo a2139 
ПА IA А AI ш II I ПА IA А AI ш I I S9[QUIU A 


Uonn[og хвшивд 


xiv P1013190 


uo4pii,) projobuopy OF 4of uoynjoy TOULO A pup ѕбизроот pio4juo;) 


4 ЧОПЧУ 


OLOGICAL MEASUREMENT 


212 EDUCATIONAL AND PSYCH! 


Sc0— 970 840 810 210 I20— 286 &yreuriouqns 10 әәїйә( 
981— SII 80 082— 460—2 3960— 578 вўвә, 10100) 
E €60 FII— 290— SI0— 20 60— 598 389 *qv90A emjotq 
60 ¢90— (090— 6FT 80 60— 68 ріеодошод 
TOI— 120 082— 12 $8601 990 499 preoqsoq 
$68 — €90— 190— $00 90 +0  Sz0— 3oefqns jo 
quq 49 ode seq) 
ПА 0 OST 680 180- 990 сэт  c00— 3oe[qns jo 

290 IA WHI 39 o9 s rop 
090 290 OLI— $10— $900— +60 600— 19e[qns јо e3y 
910 980 A 40 860 990  900— 20 958 - T= wed 3јә[ jo урша 
0} чїйпәү jo OEY 
880 200 68— AI 290 AGT #т 60 478  660— 680 wed 439] 
uo npg1u jo e[duy 
680 — 670— 840— OZI II 91 180 бїт  SIO  $80— 068  $90— wed 4q3u jo чур 
о} qzue[ jo ONTO 
810 920— 900— £00 £20 II 610  9cc— 001— 9%%- 026 бэт 620 шува 3q311 
uo прелі} jo e[Suy 
090— 280 6II— 80 860 cII— I 80— 961— 7224 690 со 981  S2I— uisqosuour jo saq 

HA IA A ALTSHESIE I IIA IA A AI ш II I S9[qULI8 A 

вло} Азиа чәәлмўә UOLO (A) sSurpvo'T o1njonujg oouo19Joi 


(WIPO ртоүо8пору OF) 
uounjog wunapnbigr 


8 ATAVL 


* Ee 


| EYMAN, DINGMAN AND MEYERS 213 
imin solution were negligible and indicate that orthog- 
шге was preserved in this situation. 

0 other analyses not included in this paper, which employed 
varimax and biquartimin criteria of rotation, the results 
istent with data presented here. The magnitude of the 
lations between the primary axes in the biquartimin solu- 
found consistent with the degree to which the biquartimin 
on was superior to the varimax criterion in obtaining a good 
imation of simple structure. In other words, the bias of the 
imin solution towards “too oblique" factors appears to be 
The only dubious quality noted in all these experiences 
biquartimin solution has been one instance of a completely 
l solution with loadings greater than one as a result of the 
tion and rotation of too many factors. A second attempt at 
n eliminating the last factor extracted by the centroid method 
ed a very satisfactory biquartimin solution. Kaiser has 
in personal communications that this “collapse” of the 
timin solution in this situation is a characteristic of the 
and is considered advantageous by Carroll as an objective 
the number factors used for rotation. 


Summary and Conclusion 


findings in this study support the advantages of both the 
nin and varimax methods when compared with hand graphic 
. Still, the biquartimin solution appears to be the more 
Y appropriate solution if orthogonality is not demanded. 
‘supported by the orthogonal structure achieved with a bi- 
а rotation in addition to superior solutions where orthog- 
f Must apparently be relaxed in the interest of simple struc- 


REFERENCES 


Il, J. B, “Bic uartimin Criterion for Rotation to Oblique Sim- 
— OM iB Factor Analysis." Science (1957), 126 (Nov-), 
J.B. “Oblimin Rotation Solution in Factor Analyte СИ. 
ogram for the IBM 704.” Harvard University (1958). 
B. and Muerle, J. L. “The ‘Maxplane’ Program a 
tation to Oblique Simple Structure.” EDUCATIONAL 
h OLOGICAL MEASUREMENT, XX (1960), 569-590. Teste 
+ W. “The Description of Aptitude and Achievemen 


214 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in Terms of Rotated Factors.” Psychometric Monograph (1951), ' 
No. 5. 

Fruchter, B. and Novak, E. “A Comparative Study of Three Meth- 
ods of Rotation.” Psychometrika, XXIII (1958), 211-222. 
Guilford, J. P. “The Structure of Intellect.” Psychological Bulletin, - 

LIII (1956b), 267-293. 
Harman, H. H. Modern Factor Analysis. Chicago: University of | 
Chicago Press, 1960. | 
Kaariainen, R. “The Relation of the Degree of Mongolism to the 
Degree of Subnormality." In process. a 
Kaiser, H. F. “The Varimax Criterion for Analytic Rotation in 
Factor Analysis.” Psychometrika, XXIII (1958), 187-200. | 
Kaiser, H. F. and Dickman, R. W. “Analytic Determination of 
Common Factors." Unpublished manuscript, 1959. 
Marks, A., Michael, W. B., and Kaiser, Н. F. “Comparison of Man- | 
ual and Analytie Techniques of Rotation in a Factor Analysis | 
of Aptitude Test Variables." Psychological Reports, VII (1960), 
519-522. | 
Meyers, C. E., Orpet, В. E., Attwell, A. A. and Dingman, H. F. 
"Primary Abilities a& Mental Age 6." In press, 1961. 
Neuhaus, J. О. and Wrigley, C. “The Quartimax Method: An Ana- | 
lytic Approach to Orthogonal Simple Structure.” British Journal 
_ of Statistical Psychology, VII (1954), 81-91. | 
Nickles, Mary R. and Keenan, T. A. “IBM 650 Program for the 
Pinzka-Saunders’ Solution for Oblique Rotation to Simple Struc- 
te University of Rochester Computing Center, New York, 
Pinzka, C. and Saunders, D. R. “Analytic Rotation to Simple Struc 
ture: II. Extension to Oblique Solution.” Educational Testing 
Service, Princeton, New Jersey, 1954. : | 
Thurstone, L. L. Multiple Factor Analysis. Chicago: University of 
Chicago Press, 1947. | 
Wrigley, C. “Objectivity in Factor Analysis.” EDUCATIONAL AND. 
PsYCHOLOGICAL MEASUREMENT, XVIII ( 1958), 463-476. 
Wrigley, C., Saunders, D. R. and Neuhaus, J. O. “Application of the 
Quartimax Method of Rotation to Thurstone’s Primary Menta 
, Abilities Study.” Psychometrika, XXIII (1958), 151-170. | 
Zimmerman, W. S. “A Simple Graphical Method for Orthogonal 
Rotation of Axes." Psychometrika, XI (1946), 51—55. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 
Vor. XXII, No. 1, 1962 


BOOK REVIEWS 


Edited by 


WILLIAM B. MICHAEL 
University of Southern California 


Alezander's Elements of Mathematical Statistics. RAYMOND 
А. GARCIA .... ss ols a TR NOIL OLD fee 
Mosteller, Rourke, and Thomas’ Probability with Statistical 
Applications. WILLIAM В. MICHAEL. «eee 
Goldberg's Probability. L. WESLEY GADDIS AND S. G. SAUNDERS 
Wortham and Smith's Practical Statistics in Experimental 
Design. L. WrsteY GADDIS AND 8. G. SAUNDERS «s. «eee 
| Rodger's Statistical Reasoning in Psychology, An Introduc- 
tion and Guide. JOHN М. IvANOFF .............** DIE 
Koenker’s Simplified Statistics for Students in Education and 
Psychology. RONALD G. RaaspALE AND JULIAN C. STANLEY 
Kendall and Buckland's A Dictionary of Statistical Terms 
(Second Edition). Wmxaaw B. Мїєнлк,........... TE 
(rko's Computer Applications in the Behavioral Sciences. 


versities, HENRY KACZKOWSKI ... ertt 
Peck, Havighurst, Cooper, Lilienthal, and More’s The Psy- 


Symonds and Jensen’s From Adolescent to Adult. WILLIAM 


LEMAN 


217 


220 
223 


224 
224 
226 


216 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Lindren and Byrne's Psychology: An Introduction to the Study 
of Human Behavior. Roy M. FITCH .................... 
Cattell and Scheier's The Meaning and Measurement of Neu- 
roticism and Anxiety. BENJAMIN KLEINMUNTZ .......... 
Bennett's Personality Assessment and Diagnosis. HAROLD 
BOREO аас 
Berte’s Essai d'adaptation de l'échelle d'intelligence pour 
enfants de D. Wechsler (W.I.S.C.) à des écoliers belges d'ex- 


pression française. ANDREW B. CRIDER AND JULIAN C. 
STANLEY 


246 
246 


249 


250 
254 


| 


EsvCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 1, 1962 


Elements of Mathematical Statistics by Howard W. Alexander. 
| New York: John Wiley and Sons, Inc., 1961. Pp. xi + 367. $7.95. 
- As Euclid sought to systematize geometry creatively into a logi- 
| cal scheme, so has Professor Alexander sought to present what in 
his opinion constitutes the indispensable fundamentals of mathe- 
Майса] statistics in a systematic, logically more unified manner. 
Thus, the title. Lucidly written and copiously illustrated with 
darityingly appropriate examples, graphs, and tables, the work 
ems to fulfill the author's purpose of creating a well-organized 
_Öposition understandable to students who have had the equivalent 
| ofa year of calculus and no formal study of probability or statisties. 
other of the Wiley publications in statistics, it is consistent with 
the generally high quality of the series. 
he exposition is divided into 55 sections which are unified under 
ven chapter headings: 1. Discrete Distributions; 2. Continuous 
tributions; 3, Sampling Distributions I; 4. Statistical Inference; 
E Sampling Distributions II; 6. Regression and Correlation; 7. 
| alysis of Variance, A ten-page appendix on matrices and vectors, 
P short briefly annotated bibliography, six tables (Random num- 
di › Areas under the unit normal curve, Selected percentiles of the 
are distribution, Selected percentiles of the t. distribution, 
a ected percentiles of the F distribution, and Logarithms of fac- 
i mals), answers to odd-numbered exercises, and a detailed seven- 
Pige index end the book. There are about 419 student exercises, 
aa in difficulty from simple to challenging. Approximately 162 
amples expand and illustrate basic concepts. Eu 
E. this is a new book, incorporating most of the best points o 
Es Works but structured so as to overcome many of their in- 
Quacies, comparing it with other recent texts may be iar 
К b Two such are An Introduction to Mathematical Statistics by 
sini, Brunk (Ginn and Company, 1960), intended for an шш 
"lix 1 © that of the book being reviewed; and Elementary ge - 
Deor Paul G. Hoel (John Wiley and Sons, 1960), е or 
- Alü S With no more mathematics than high school alge ver 
fog, "Bh Brunk's book is intended for use in a one ве 4 
е and Alexander's in а two semester course, the orme ion 
Ami, У MCludes a greater variety of less completely deve m 
| Such as Statistical Decision Theory, in seventeen generally 


217 


218 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 


shorter chapters covering 403 pages. Where Alexander structures 
his work about the basic concepts of sampling distributions and 
hypothesis testing, and introduces advanced mathematical concepts 
and probability theory where appropriate, Brunk organizes his 
around the ideas of probability as an essential mathematical tool 
and statistics as a decision-making aid employing mathematical 
equipment. Although the topies are arranged in different order, 
there seems to be agreement on what is essential. 

Brunk includes about 463 exercises with no answers, and about 
57 examples clearly indicated as such. The disparity in the number 
of examples used may be partly attributed to the fact that Brunk’s 
book seems to present concepts in a manner similar to that of 
mathematical works which emphasize conciseness and comprehen- 
siveness, and introduce illustrative examples only where absolutely 
necessary, while Alexander’s work seems to suggest that the few 
really basic ideas in mathematical statistics must be understood 
thoroughly; his frequent examples help the student tremendously 
in gaining a feel for the subject. Consequently, although Alexander's 
book is more unified, Brunk’s may appeal to the more mathemati- 
cally inclined student, while Alexander's may be more readily 
grasped by most students. However, where Brunk seems to avoid 
mention of the gamma and beta functions and thus does nob ex 
plicitly develop or state the F and t distributions, Alexander pre 
sents all completely and clearly. Basic omissions such as these, m 
Brunk’s and other similar expositions, may make Alexander's mort 
systematic work more desirable. In the interest of teachability, 
neither is consistently rigorous mathematically. 

Hoel’s exposition, requiring considerably less mathematical £0- 
phistication and conceived with the purpose of having a more satis- 
factory elementary text prepared by a mathematical statistician 
for a one-semester course in descriptive statistics, includes a chapte! | 
on time series and a more detailed description of nonparametti¢ | 
tests among thirteen chapters on 261 pages, approximately 2 
exercises with answers to the odd-numbered problems given in the 
back of the text, and from ten to twenty examples per chapter (the 
number depending on what is considered a distinct example) worked 
into the text. Both books make great use of illustrative examples 
The concepts described in Hoel's work are generally developed rel | 
tively more completely and mathematically in Alexander's book: 
Where Hoel stresses how to do it, Alexander emphasizes why it Р 
done as it is. Each seems equally successful in reaching its intende 
audience. { 

The definition of probability given in Chapter 1 on page 28 ? 
Alexander's book may cause discomfort to some since, after som E 
practical and historical justification and brief but frank admission 
of some difficulties in defining probability in terms of a limit, i ® 


BOOK REVIEWS i 


stated in terms of limits of relative frequencies. Difficulties entailed 
hy this definition are apparently reduced by introducing, after his- 
torical and empirical justification of the terms “equally likely” and 
"equiprobable," the classical definition of probability as a theorem 
on page 31. After the chapter on continuous distributions further 
complicates matters, Alexander turns to a concisely stated abstract 
mathematical definition similar to that of Brunk. 

Comparing Alexander’s definition and theorem with Brunk’s and 
Hoel’s definitions may prove illuminating. Brunk states a version 
of the classical definition on page 5, then gives an abstract formula- 
tion of a probability space on page 22. Hoel states a classical defini- 
tion of probability on page 36 of his work. , 

Brunk and Alexander use the mathematician's device of assuming 


- the existence of something troublesome to deal with directly, while 


Hoel relies on the classical, and somewhat intuitive, definition. 


| Alexander attempts to get around difficulties without postulating 


Several kinds of probabilities, apparently with little more success 
an either of the other writers, He is quite successful, however, in 
‘equainting the student with inherent difficulties. ; 
Probability theory and diserete distributions are presented in an 
easily comprehensible manner through the consistent use of set 
‘ory, intuitively developed through Venn diagrams with some 
gestion, by means of examples and a list of the assumptions of 
Oolean algebra in Chapter 1, as to what ean be done rigorously. 
Brunk’s Presentation of sets is briefer and not so clear, while Hoel 
Makes no explicit use of set theory. The entire first chapter covers 
the topic of discrete distributions as completely as necessary, even 
going so far as to prove Tchebycheff’s inequality. Frequent reference 
made in the first two chapters to a “simple spinner” modified in 
erent ways to illustrate helpfully the effect of different as- 
“mptions on probabilities, Dd 
© Second chapter on continuous distributions is developed n 
Parallel to the first. The gamma, and beta distributions, Stirling’s 


| j toximation, and a continuous form of the binomial function are 


ing Oped with the purpose of showing in Chapter 5 that the limit- 
orm of the bino distribution is the normal as the Hune е 
ist [vations approaches infinity. The Cauchy, Student's js Hj 
Pand utions are introduced as examples in this chapter but e 
ed upon as important sampling distributions in Chapter B 
Brus the moment generating function was judged to d» 
e ject at the advanced calculus level and methods using it wou 


obse: 


r appreciated after exposure to even more complex pro- 


Ж а H 
d Such as those requiring generating functions, i ^ pe 
hier p t. the student, moment generating functions are le 

1 


Very clearly starred supplementary sections well worth 


3 i t . . 
Ming and using in any course in mathematical statistics, Brunk 


220 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1 


includes moment generating functions as part of ће expected work, 
while Hoel does not mention them because of the different level of 
his book. 

The preface to the book accurately describes its structure and 
may be well worth quoting: 


ч... Of the seven chapters, the fourth, on statistical inference, is 
pivotal. The aim in the preceding chapters has been primarily 
to develop enough probability and sampling theory so that a 
proper conceptual framework can be provided for statistical 
inference. Chapters 1 and 2 present the probability theory and 
the descriptive statistics of a single random variable. A few 
scattered results concerning sampling distributions are intro- 
duced where there is opportunity. Chapter 3 is devoted primarily 
to the simplest types of sampling distributions and to some of 
the simpler ways of deriving them. This necessitates a full dis- 
cussion of bivariate distributions and some mention of multi- 
variate distributions. After Chapter 4 the exposition divides 
into two main branches: sampling theory on the one hand (to 
which Chapter 5 is devoted) and statistical methodology on the 
other, represented in Chapter 6 by regression and correlation 
theory and in Chapter 7 by the simpler forms of analysis of 
variance. The final sections of these two chapters are, however, 
pun to the distribution theory associated with these meth- 
ods.” 


The book is generally consistent in quality with the first two 
chapters and consistent with the author's description. It contains 
what most statisticians and teachers of statistics would concede 
constitutes the fundamentals of mathematical statistics. The set- 
tions on nonparametric tests and the chapter on analysis of vari- 
‚ ance, though sketching the subjects accurately, are a little too brief. 

The most striking feature is not the topics included, for these may 
be found in most previous works, but the clear and systematic 
exposition, This along with other features may make the subject less 
forbidding to many students not mathematically gifted and may 
make possible a more mathematical treatment of statistics in the 
usually intuitive and descriptive courses in statistics for the socia 
sciences. 

RAYMOND A, GARCIA a 
Laboratory of Experwmem 
Design 1 
University of Wisconsin 


Probability with Statistical Applications by Frederick Mosteller, 
Robert E. K. Rourke, and George B. Thomas, Jr. Reading: 


BOOK REVIEWS 221 


Mass.: Addison-Wesley Publishing Company, 1961. Pp. XV + 
512. $6.50 
On rare occasions individuals whose primary training is in the 
area of mathematical statistics and pure mathematics can write not 
only a sound and comprehensive but also a highly teachable book in 
introductory statistics and probability—a book not only that pro- 
fessors in the behavioral sciences can understand but also that their 
students with one or two courses in psychological statistics can use 
аз а basic text in a second or third course in quantitative methods 
without developing too much anxiety and strain. Such a volume is 
| the one prepared by Mosteller, Rourke, and Thomas. It is the 
writers’ hope that with minimal background of a second course in 
- high school algebra the reader will gain three kinds of learning: 


... first, an understanding of the kinds of regularity that 

_ occur amid random fluctuations; second, experience in associ- 

| ating probabilistie models with physical phenomena; and third, 
the ability to use these mathematical models to interpret the 

| physical phenomena and to predict, with appropriate measures 

of peram, the outcomes of related experiments. (Preface, 
p.ix 


Although the writers are probably overly optimistic concerning 

e minimum amount of exposure to formal mathematics required 

or comprehension of their exposition, in that the added maturity 
famed from two years of study of college mathematies would be 
ghly beneficial in developing facility in forming concepts and in 
andling their mathematical notation, nevertheless they have 
_Whieved to an admirable degree their objectives. After four clearly 
Witten introductory chapters upon elementary probability theory, 
adding one developed in terms of finite sample spaces, the 
Authors furnish a highly readable fifth chapter concerning proba- 
aty functions of random variables, mathematical expectation of a 
dom variable, and various properties of these distributions in- 
‘luding a proof of Chebyshev’s theorem. Following these five chap- 
“8 come three others: one upon joint distributions and continuous 
tributions with emphasis on the normal probability dar 
» 4 er upon the binomial distribution including a section Hah e 
0 M limit theorem, and a third concerning statistical applica vis 
in Probability in which problems of estimation and hypothesis test- 
үр ate considered without prior information and with some prior 
posto (involving aspects of Bayesian inference). In Chapter 
Eo Dis in sampling theory are developed that include m 
ү ation of variances of sums and of averages of random mus : 
of 9f correlated variables. Finally in the closing chapter pro e 
Curve fitting, least squares without consideration of proo 


ished by diferential calculus, and regression and prediction 
* discussed, 


ee à | 
=” 2 о Б s I 


222 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Almost always examples precede the introduction of each new 
concept and further illustrative examples follow most definitions 
and theorems. Problem exercises at the end of each chapter serve 
to reinforce the learning gained from study of formal presentation 
and illustrative examples. 

As may become apparent to the reader of the text, much of the 
material served as the basis for the nationally televised course in 
Probability and Statistics presented early in 1961 on the NBC 
Continental Classroom. In addition to these built-in pedagogical 
advantages, the writers have had preparatory experiences as mem- 
bers of the Commission on Mathematics of the College Entrance 
Examination Board that undoubtedly made them acutely aware 
of the importance of writing a modern and forward-looking text 
Perhaps a series of objective examinations will be forthcoming to 
accompany each of the chapters to facilitate an instructor’s evalua- 
tion of his students’ grasp of the fundamental concepts presented. 

Several of the noteworthy miscellaneous features include two 
appendices in which principles of elementary set theory and tech- 
niques of expressing mathematical ideas through use of summations 
and subscript notations are set forth; tables of 2500 random digits, 
of values of n! and log п!, of areas under the normal curve for 
abscissa values in standard score form to hundreths, of three-place 
entries of both individual terms and cumulative terms of the 
binomial; a chart of the 95 per cent confidence limits on p, the 
probability of success on a single binomial trail; answers to even- 
numbered exercises; a bibliography of 27 references grouped i? 
terms of levels of mathematical sophistication required for beri 
reading; and on the inside pages of the cover of the book a useful 
summary of key formulas, axioms, rules, and definitions as well “a 
glossary of terms cited within the 400 pages of the text; a see 
appendix on a theorem of independence; and two tables subsequen 
to the index on what would be an extra page before the rear cover 0 
the book in which cumulative proportions and corresponding values | 
of normal deviates are noted. a E 

In addition to giving the bright undergraduate student in engi- 
neering, physical science, and mathematics an over-all view of " 
nature of probability and statistical inference, this book should i 
of considerable aid and value to gifted undergraduate and intei 
lectually superior graduate students in psychology who are con 
cerned with the use of probabilistic models and aspects of set thee 
found in statistical learning theory and decision-making process 
One cannot fail to see that the book is accurate, up-to-date, balance 
in its selection of topics, and logically organized. hat 

Despite reactions concerning its inherent difficulty and 0 
abstract nature that the reviewer has received from several o ut 
more able graduate students, its pedagogical advantages far 0 


BOOK REVIEWS 223 


weigh its disadvantages. For several years to come the volume 
should serve as a useful standard or model for a one or possibly 
two semester basic course in mathematical statistics with the con- 
tents of which most advanced graduate students in experimental 
psychology and psychometries should probably be familiar. In any 
event its values as a reference aid for curious students and informed 
teachers of elementary statistics cannot be minimized. 

WILLIAM B. MICHAEL 

University of Southern California 


Probability by Samuel Goldberg. Englewood Cliffs, New Jersey: 
Prentice-Hall Inc., 1960. Pp. ix + 322. $6.75 

This text, which is specifically oriented toward the newcomer to 
Probability theory and probabilistic models, is remarkably adapt- 
able to quantitative evaluation of psychological and physiological 
problems. It includes many sections recently tested in college classes 
and in mathematies institutes sponsored by the National Science 
Foundation, Further, it appears as if most of the essentials, includ- 
ing correlation and elements of sampling theory, are covered for 
Persons who will use probability concepts and statistical techniques. 
‚ The text opens with one of the better discussions of set theory, 
Including a, very basic but adequate treatment of the mathematics 
of sets. A most outstanding and lucid discussion with examples is 
Sven in this section on the algebra of sets, and the descriptive 
material in this one chapter is possibly the highlight of the book. 

1 Chapter 2, fundamental definitions and problem examples are 
Sven which further aid the serious student to develop a suitable 
mathematical way by which an experiment may be designed. 

Pecifically, the topics treated include sample spaces, events, con- 
У tional probability, independent events, independent trials, and the 

Chapter concludes with an excellent probability model in the field 
01 genetics, : 

The text continues with a chapter on sophisticated counting 
techniques and probability problems, random variables, binomial 
distribution with applications, and concludes with a section dealing 
With testing of statistical hypotheses. : 

With the exception of Chapter 1, and the section in Chapter 5 
Which deals with developing and testing a statistical hypotheses, 
A '8 text would not appear to be of value to the typical crassa 
puted Psychologist unless he is well trained in mathematics. à 
| End be of value, however, to engineering or economics studen 
wiring the field of mathematical analysis and to those poten 
р 9 are in the field of engineering reliability. In addition, t s 
ha Sons competent in the area of experimental design in пе ud 
pm and social sciences would find it of interest, althoug E 

cific area of experimental design is a weak point in the text. 


N 
224 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT : 


Major strengths are discussions of such topics as random variables, 
standard deviations, and binomial probability distributions. 
Admittedly, this text furnishes a reliable foundation for more ad- 
vanced studies in probability and statistics. 
L. WESLEY GADDIS AND S. С. SAUNDERS 
Aerojet-General Corporation 


Practical Statistics in Experimental Design by A. W. Wortham and 
T. E. Smith. Dallas, Texas: Dallas Publishing House, 1959. Pp. 
ix + 128. 

Although it is a cookbook type of applied practical statistics, this 
text is a good short-form working manual which will assist re- 
searchers in developing an experimental design in order that specific 
quantitative engineering problems may be evaluated. This is true in 
terms of quality control operations, but is not true in terms of de- 
tails of developing probabilistic models or mathematical formula- 
tions of experimental designs. 

To the engineer-statistician who is assigned a quality control 
task, this text will aid not only in identifying the important param- 
eters to be considered but also in determining the statistical ap- 
proach to evaluating the problem at hand. Because of the existing 
format of the text, the user will not be required to make specific 
additional reference to texts in mathematical statistics. Simple - 
explanations are given on how to compute various statistics, and the 
short form tables of random numbers, ¢ values, and F values will be 
of use to all individuals in the field of quality control. 

The principal strengths of this text are the*sequential arrange- 
ment of the material, the illustrations, and references. The short but 
purely descriptive glossary aids the user in specifying in statistical 
language reasonably well what he intends to say to a trained 
mathematical statistician, Unfortunately a thorough analysis 9 
precisely what is meant by experimental design is not given, an 
perhaps the comments which are made may be misleading. } 

For a beginner in the statistical aspects of operational quality 
control this text would be of value. This could also be true for the 
behavioral scientist who is not interested in mathematical aspects 
of psychology but who should understand the meanings and ге 
quirements of a good experimental design. І 

L. Westy GADDIS AND S. С. SAUNDERS 
Aerojet-General Corporation 


ў 2 
Statistical Reasoning in Psychology, An Introduction and Guide 
by R. S. Rodger, London: University Tutorial Press, Ltd., 1961. 
Pp. 204. 15/-. 
In typically English fashion, Rodger has been able to compres 
а great deal of knowledge into a very slim volume. This coheiselY | 


BOOK REVIEWS 225 


but carefully, written text is an introductory investigation into the 
statistical theory of normal variation where the author has at- 
tempted to give the reader “a conceptual model of statistical method 
in its most advanced form, rather than a compendium of useful 
techniques whose adequacy and limitation he may not understand.” 
The text provides for those who desire critically to examine statisti- 
| cal techniques. It assumes that the student has no knowledge of 
statistics and a somewhat limited background in mathematies; yet 
it certainly avoids the typical “cookbook” approach customarily 
employed in beginning texts which assume no mathematical back- 
ground, 

Although the author tends to avoid those technicalities essential 
to the work of the mathematical statistician, he has selected aspects 
of same which seem essential for prudent use of statistical tech- 
miques and he explores them in non-mathematical language and 
with ample reference material cited. 

The text is written primarily for students in psychology; how- 
ever, it should not deter others in the social sciences from benefiting 
Tom it, since the amount of psychology contained is minimal. Non- 
Parametrics are being used more and more by the psychologist for 

18 analyses of his experimental results. Nevertheless, Rodger feels 
that “normal” theory provides a more thorough conceptual model 
than any other. It is his aim that this text will aid the psychologist 

to understand when ‘normal’ theory is, or is not, adequate to his 
needs, If he uses normal theory as an approximation to his ideal 
Tequirement, he should know how approximate itis.” —— 

Odger’s emphasis is on the “organization of statistical theory 
"(her than оп its detailed results.” Teachers secure in statistical 
Methods should find this text а good basis for further explanation 

ind exploration of the theory. However, it seems that beginning 
E ents, in spite of the abundance of pertinent references presen 
i guide them into statistical literature, may become disenurt ee 
le brevity of the text as à basic guide and may prefer a text wit! 
More detailed exposition, more illustrations of the application of 
statistica] methods, and less dependency on reference material. 
he һе Paucity of the exercises provided in the text (43 ак нан 
tt m keeping with Rodger's intent, but may leave. the il А 

dent With a less than desirable amount of practice in uti i3 d 
à $ meal application of the subject matter if the instructor res ae 
plication efforts to these exercises. In all fairness Бош, ed 
p “elses appear challenging, avoid laborious computation, an 
py oeieally realistic, description of. distri- 

usual ele 1 escri 
hu ions, e над: po a nri in significance 
э ernoullian binomial, chi square, F, normal and t-distribution, as 
as means, medians, correlation coefficients, variances, semi-in 


226 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


terquartile ranges, and regression equations) are presented. The 
underlying assumptions made in constructing graphs, in computing 
estimates, and in testing them for significance are adequately 
treated, as well as are the important limitations brought about by 
measurement scale properties. 

Since Rodger's emphasis is one concerning the use of statistics in 
experimental work, he includes essential discussion of statistical 
hypotheses, explanatory hypotheses, existing and hypothetical pop- 
ulations, statistical, logical and psychological probabilities, rejection 
regions, and decision errors, 

The text appears to be successful as an introductory guide. It 
offers a brief, but accurate and sound theoretical foundation upon 
which to build “a conceptual model of statistical method in its most 
advanced form.” 

Joun M. IvANOFF 
Marquette University 


Simplified Statistics for Students in Education and Psychology by 
Robert H. Koenker. Bloomington, Illinois; McKnight & Me- 
Knight Publishing Company, 1961. Pp. viii + 167. $3.00 

This is a splendidly legible, attractively printed paperback 
manual that uses a pink background well to highlight topic headings 
and several graphs. It offers numerous aids to the student, including 
full answers to 124 problems, Some teachers of elementary educa- 
tional statistics may find it a useful workbook-type supplement for 
a textbook that explains more fully the rationale of statistical 
method. 

à "The actual solutions of most statistical problems are quite 
simple. The calculation of the square root ( a ninth grade skill) is 
probably the most difficult arithmetic skill involved” (page у). 
These sentences from the preface of this attempt at simplifying 
statistics are indicative of the approach used. The emphasis is 00 
how to compute statistics, with little attention to knowing why 026 
would want to compute them or which statisties one would want 
to compute in a particular case, 

The preface is also somewhat misleading in its statement con- 
cerning the square root, for symbolism and vocabulary may prove 
difficult at times. Although there are two pages of instruction give? 
to the topic of “Finding the Square Root,” the explanation of 3 
notation is confined to one line on p. 4: “¥ is the Greek capital letter 
and stands for the sum of.” It seems unlikely that this explanation | 
will be sufficient to prevent the usual consternation of students firs 
confronted by such a symbol. 

This is not the only evidence that the book at times presupposes 
knowledge beyond the ninth grade level. A Quartile Deviation 5 


BOOK REVIEWS 227 


on p. 9 as “one half the interquartile range,” with no further 
n of interquartile range. 

nain problem, however, is not that people with poor mathe- 
backgrounds will be unable to use this book, but rather 
will use it. Edueation and psychology already have an 
nee of people who can and do perform various statistical 
ithout ever knowing whether they are the appropriate ones. 
ald be better if a book of this size would attempt to give 
a knowledge of statistics that would enable them to 
d literature in which statistics play an important role, 
ularly test manuals, without giving them the idea that they are 
ified to manufacture their own statistical evidence. 

we have been concerned mainly with the purpose of the 
Let us now investigate some of its more easily detected in- 


9 the average or mean deviation is defined as “the average 
deviations from the mean" with no mention of absolute 
The naive student of statistics, for which this book is in- 
could easily misinterpret this definition. X 
. 83 we read that “Standard scores are easily averaged. 
of course is true, just as it would be "easy" to average the 
d deviation with the mean. The important thing to note is 
in both cases the averaged measures lose their original mean- 
8, in the case of standard scores, is easily illustrated by an 
ple. Suppose there was a person who received a standard (2) 
Of plus one on each of 100 achievement and ability tests (both 
and physical) administered to him. This individual would 
| e most phenomenal in terms of his consistently high 
8; yet his average standard score would be only plus one. 
P. 85 a nonsignificant difference is twice referred to as an 
mificant” difference, an adjective that can lead to varied mis- 
of statistical results, especially since no statements concerning 
Wer of statistical tests are made. \ th 
P. 86 and various pages following, reference 1s made to ' i 
Оп to “accept the null hypothesis.” This is, of course, а pice 
Ssbility using the t tests being discussed. The only dn a 
are to reject the null hypothesis or to fail to reject X d 
as stated there, “the [null] hypothesis is either accepted, re 
9r we remain in doubt." j 
0 on p. 86 is the following statement: “When PUR gu Fr 
8 of independent groups the t test is not valid unless es 
lard deviations of the two groups are equal. dete T si- 
Аце of t is significant at the .01 level of probability the post 
the standard deviations not being equivalent is E T 
At this point a footnote refers the reader Meo h for 
SÜs 1940 Statistical Analysis in Educational Researe 


» 


228 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


support of the statement. What Lindquist actually says in the 
indicated portion is that, when the value of t is significant at the 01 
level of probability, the possibility that all of the difference is 
caused by the difference in standard deviations and none by the 
difference in means is quite remote. 

What is needed today is not a book that tells a teacher how to 
check the distribution of her grades by using a chi-square “good- 
ness of fit” test (as this one does), but one that explains carefully 
the meaning of statistical terms she is likely to run across in the 
course of her work, without giving the impression that every 
classroom test should be given a complete statistical analysis. 
Most educators should be consumers of statistics in test manuals 
and professional literature, rather than generators of any but the 
simplest statistics themselves. In order to employ such statistics as 
t and x? properly, they need more understanding than computa- | 
tional manuals such as Koenker’s—however carefully prepared— 
can help them acquire. The original fault is not Koenker's, for he is 
merely following a persistent “cookbook” formula set up long ago. 
We need newer, better recipes! 

RONALD С. RAGSDALE AND JULIAN C. STANLEY 
Laboratory of Experimental Design 
University of Wisconsin 


A Dictionary of Statistical Terms (Second Edition) by Maurice G. 
Kendall and William R. Buckland. New York: Hafner Publish- 
ing Company, 1960. Pp. xi + 575. 

Representing the outgrowth of the efforts of a group of nearly 30 
distinguished members of the International Statistical Institute, 4 
Dictionary of Statistical Terms was intended to provide an explana- 
tion of terms in modern use irrespective of whether each one coul 
be considered desirable in light of recent knowledge. For the most 
part definitions are expressed in verbal terms—from one to six OF 
eight sentences—with minimal dependence upon use of mathemati- 
cal formulae. In addition to four index glossaries in French-Eng- 
lish, German-English, Italian-English, and Spanish-English, there 8 
in the Second Edition a combined glossary of 82 pages in which Е. 
alphabetical list of English terms is matched with revised equivalen 
terms in French, German, Italian, and Spanish. Unfortunately there 
are no Russian words cited. However, the doctoral student in stati 
ties or in the behavioral seiences who is faced with the prospect ч 
taking a reading examination in any one of the four other languag? 
will certainly be delighted with the dictionary, as will be the ™ 
search worker who has need to read non-English articles. | tal 

The behavioral scientist concerned with problems of experimen 3l 
design, factor analysis, and regression and correlation theory Ti 
find many of his favorite expressions described. As might be e 


| 
| 


BOOK REVIEWS 229 


pected there are several omissions of statistical terms in test theory 
and psychometries that are obviously of a highly specialized nature. 
Synonyms in the form of cross references are often cited as in the 
instances of characteristic roots, latent roots, and eigenvalues. Par- 
ticularly useful will the dictionary be to the research psychologist 
who occasionally encounters, in his reading, words of a statistical 
favor in other disciplines such as agronomy, economics, and engi- 
neering. It may be surprising to many, as it was to the reviewer, to 
note how frequently the dictionary definition clarifies certain am- 
biguities or uncertainties over the meanings of many terms that one 
may have been using somewhat inexactly or even somewhat glibly. 
The reviewer would urge that any one with a serious interest in sta- 
tistics be sure to obtain this volume for his library, since one may 


- “pect to refer to it frequently. 


WILLIAM B. MICHAEL TN 
University of Southern California 


Computer Applications in the Behavioral Sciences by Harold Borko 
(Editor). Englewood Cliffs, New Jersey: Prentice Hall, Inc. Pp. 
607. $8.75. 

Perhaps the most illuminating document to appear in several 
Years, Computer Applications in the Behavioral Sciences will be of 
Major interest to those persons who are active researchers in psy- 
tology and education. Many volumes have been written on the 
"gineering and design aspects of computers. Other material has 
hi concerned with mathematical and programming problems, and 

ill others have dealt with business applications. 

[лу а limited amount of material is available to the researcher 

.0 18 not а specialist in computers, and even less is available which 
Eos computer applications of interest to the behavioral ue 
com “any investigators are actively concerned in the d = 
Teg Puter technology, both in its effects on society and on his o bi 
Eo Interests, The requirement for such information 18 no 

“ll satisfied with this text. y 
lishing focus of the introductory chapter is directed toward PUE 
= 5 definitive differences and relationships between n F 
Ee (and computer technology). A comprehensive ph es 
lior relationships is presented, and the wide coverage 0 m 
vill "y and practice of quantification of psychological pro 

enhance its value for those in the behavioral sciences. ENEN 
tis о Stated intent of this book is to introduce the de 

Compl 5 language and usage of computers. This ben van 

Bis Ishes this admirably, but also goes beyond this 0 Дд : Es 

apap ation of analogies, as well as the presentation 0 d 

Mer, ities and limitations of computers, makes it m depen 
CY pleasant reading. Since portions of this sophisticate 


230 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


require careful reading, these sections will satisfy the queries of 
even the most experienced computer analyst. 

Many present day behavioral scientists tend to view their more 
astute contemporaries—who use computers and the specialized 
mathematical statistical techniques, as well as the vocabulary 
peculiar to computers—just as they also view the use of opium. 
Unfortunately, they may be somewhat correct, for some psycholo- 
gists appear to step into the field of mathematics and computers 
merely to impress their colleagues or the lay public. This is not true 
of the writers of this text, since they clearly show the complemen- 
tary roles of computer techniques and applied mathematical sta- 
tistics in quantitative psychology. 

The book provides most of the technology necessary for the use- 
ful implementation of a large scale computer; yet the writers are 
careful to establish the requirement of a good background in sta- 
tistical inference. 

The volume is organized in three parts. It begins with a general 
discussion of the role of computers in the behavioral sciences, which 
is followed by an examination of the nature of mechanical thought. 
Next, a brief history of automatic data processing is presented 80 
that current developments can be viewed in their proper perspec- - 
tive. The concluding chapter in this first part deals with computer 
principles and applications. 

Part II presents a somewhat detailed description of the organiza- 
tion of a computer, numbering systems, machine language, and an 
introduction to programming concepts. а ; 

"The reader, after having completed this material, has made his 
acquaintance with computers and the terminology used by workers 
in the field. He is now ready to look into some of the original and 
interesting work being pursued with the aid of computers. Part T I 
provides him with this opportunity, and acquaints him with the 
work of a number of eminent researchers. A wide variety of com- 
puter applieations is presented. The material is organized into 
several groups of chapters so that Chapters 8 and 9 deal with 
general implications and review data processing in the fields of 
psychological research, Chapters 10, 11, and 12 are concerned wit 
statistical computations, i.e., multiple linear regression models, l 
factor analysis, and canonical analysis. The next six chapters, 13 
through 18, discuss the use of the computer as a research technique 
for investigating areas of traditional concern to the psychologist 
such as perception, education, cognition, language behavior, 8n 
creativity. The employment of computers as a simulation technique 
in the creation of neurophysiological and social system models 18 
described in Chapters 19 through 24. The concluding chapter 3t- 
tempts to look into the future, predict trends, and anticipate prob- 
Jems. 


/ 


BOOK REVIEWS 231 


In conclusion, it appears as if this volume will be an excellent 
teference source for an appreciable length of time, for all scientists, 
ad it is of considerable value to active investigators in the be- 
havioral sciences today. 


L. WESLEY GADDIS 
| Aerojet-General Corporation 


4 Primer of Programming for Digital Computers by Marshall H. 
Wrubel. New York: McGraw-Hill Book Company, 1959. Pp. 
xv + 230. $2.95. 

^ After considerable difficulty with the teaching of principles and 

techniques of computer programming for IBM equipment from an 

all but incomprehensible manual, the reviewer was delighted to 
have the opportunity of using Wrubel's book. The author starts at 

the beginning instead of the middle, and instruets the reader in a 

systematic step by step fashion. Explanations of important con- 

cepts are clear and concise; the resulting development is an excel- 
lent introduction to the subject of high speed computing, a field 

- Simple in principle but intricate in its details. 

e novice at the use of high speed computers should be fore- 

Warned that there is, today, a bewildering variety both in kinds of 

‘Machines and in methods for operating a given machine, In the 

Interest, of clarity, Wrubel for the most part confines his deserip- 

tions to the model IBM 650 and three codes; however, the beginner 

Who masters these will have a firm grasp of the basie principles 

‘Underlying all digital electronic calculators. А 
Tofessor Wrubel's volume is organized into eight carefully writ- 

J Chapters, The remainder of this review attempts to summarize 
*quately the contents of these chapters. ] 

‚ “Пе first, chapter is introductory, and brings the reader's atten- 
on to useful background from the field of numerical analysis. The 

: "program" is explained as meaning the set of detailed а 

pe Which tells the machine exactly what to do with the dubi da ^ 

the. most important material expounded here is the “hardware, 


, 1 as 
follows.” components of IBM’s type 650 machine broken down 


1. The input (or hopper) which accepts data and instructions, 

' he output, which prints out the final answers, 

3. The memory, which consists of 1000 addresses, prt Ake! 
ach a pigeon hole in which a 10 digit number can . 


Chante : by Wolontis at 
th r 2 deals with the L — 1 code developed 
эр Telephone Laboratories. In this system the numbers 1, 2, 


: { the instruc- 
tio, Stand for the operations +, —, x, and --. Thus, 
"3 201 202 203 would tell the machine to multiply the number 


232 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in address 201 by the number in address 202, and then put the 
product in 203. 
Chapter 3 is concerned with “loops” and “branches.” These two 
topics are elaborations of the L — 1 system, which render more 
convenient the programming of problems involving repetitious 
manipulations of the data. 
Chapter 4 is entitled Flow Diagrams, Subroutines, and the Pro- 
gram Library. A brief explanation of these three concepts would 
seem to be in order. A flow diagram is a preliminary sketch of the 
plan by means of which the machine is going to solve a problem. 
A problem, once solved in detail, can be permanently recorded, and 
then used later on in the solution of a larger problem. This perma- 
nent record is called a subroutine. A collection of subroutines, care- 
fully stored and catalogued, is known as a program library. 
In Chapter 5 the author discusses the technique of locating errors 
in programming, a difficult process referred to as “debugging” in 
machine vernacular. The two most important methods are called 
the sample calculation and the memory dump. The sample calcula- 
tion approach is essentially a dry run in which data for which the 
answer is already known are processed by the machine. The differ- 
ence between the machine's answer and the correct answer is then 
traced back through various checkpoints in the program. In the 
memory dump method, any portion of the memory at any phase m 
the running of a problem can be printed out for visual inspection. 
In Chapter 6 an ingenious idea is presented which helps eliminate 
the formation of errors in the first place, Mistakes in programming 
are ordinarily human in souree and occur when the programmer 
translates the algebra of the problem into machine language. Auto- 
matic programming utilizes the 650 machine itself to perform the 
translation, by means of an auxiliary code called FORTRANSIT. 
_The last two chapters of the book (7 and 8) present a viewpoin 
directly opposite to that of automatic programming. There exis 
circumstances in which the programmer wishes to eliminate inter- 
mediate devices and to work as directly as possible with the basie 
gadgetry of the machine. One code capable of accomplishing this В 
called SOAP, which is an advanced technique used when the pI* 
grammer finds that he must increase available storage capaci 
decrease running time, or modify an existing program without hav- 
ing to resort to complete reassembly. 
М. С. PERRY , 
Auburn University 


The Process of Education by Jerome S. Bruner. Cambridge: Har- 
vard University Press, 1960. Pp. xvi +- 97. 
The National Academy of Sciences, in 1959, arranged for д 
Woods Hole, Massachusetts, meeting of thirty-five scholars 12 


BOOK REVIEWS 233 


Arts and Sciences; their stated intention was “to examine the 
fundamental processes involved in imparting to young students a 
sense of the substance and the method of science.” This book, a 
"ehairman's report," comprises Bruner’s summary and mediation 
of the conference, pre-conference activities, and post-conference 
discussions, conferences, and correspondence. The result is a highly 
readable, bold, and heuristie document that, coupled with the sta- 
ture of the author and his colleagues, is likely to have an identifiable 
fect on the language and organizing concepts of the professional 
educator. The boldness of the treatment is represented in Bruner's 
unblushing willingness to state tersely and strongly propositions 
which a more anal intellect would withhold as “premature,” "dog- 
matic,” and “over-generalization.” The heurism of the book lies in 
the union of this temerity and the assumption-challenging reformu- 
lations that constitute the body of the material. 

The book is organized around four basic “themes,” each accorded 
à chapter: (1) That the curriculum of a subject should be deter- 
mined by the most fundamental understanding that can be achieved 
of the underlying principles that give structure to that subject; (2) 
that “any subject ean be taught effectively in some intellectually 
honest form to any child at any stage of development”; (3) that 
Intuitive thinking is a necessary aspect of produetive thinking both 
m academic and in everyday tasks; (4) that interest in the material 

e learned is the best stimulus to learning. ; 1 

One of Bruner's major propositions became a topic of widespread 
‘nd common conversation immediately after publication of the 
hook: The assertion that the foundations of any subject can be 
taught to anybody at any age in one of its forms. This notion is 
‘hderstandably startling, and perhaps even threatening, to anyone 
vho has adopted the conventional professional stance of assuming 
that (a) the basic and powerful ideas of science are, part passu, 1m- 
Possibly complex for some levels of instruction and for some pupils 
tt any level; (b) the crucial test of a pupil's educability is the suc- 
ess of the teacher in “teaching him." It appears, therefore, Em 
ven if Bruner is riding the wrong horse, he is riding in the righ 
ütection and applying the spurs at a crucial spot. Equally ОЕ 
mg is the premise underlying most of the discussion Lah б 
ag “That intellectual activity anywhere is the same, е 

е frontier of knowledge or in а third-grade classroom. ти. 
sentist does at his desk or in his laboratory, what a literary uer 

068 in reading a poem, are of the same order as what anybody k se 

oes when he is engaged in like activities—if he is to achieve under- 
not in kind. The school boy 
* d it is easier for ын to ee 

ysics eiTe S : ing something else. 
Тв behaving like a physicist w p bos dem uid d 


234 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


theme, will sound like old friends to those who understood and were 
persuaded by Dewey's formulations earlier in the century. It is true 
that Bruner is more inclined to find his stimulus in the “structure 
of the subject matter," but the statement quoted immediately 
above implies what the fourth theme makes explieit—that the 
strueture of a field can generate appropriate motives for learning. 
Bruner is an apt phrase-maker and the text is sprinkled with 
quotable statements—many of which are excellent candidates for 
the arsenal of epigrams in education (“An unconnected set of facts | 
has a pitiably short half-life in memory."). Furthermore, even 
within the limits of what is a very brief book, we are offered new 
linguistic conventions, distinctions, and analyses covering a large 
portion of the phenomena of interest to the educator. A particularly 
interesting example of the latter, for instance, is Bruner's conception 
of “the act of learning” as involving three somewhat simultaneous 
processes: Acquisition of new information, transformation of infor- 
mation, and evaluation of the transformations. The book, whether 
intentionally or not, also offers a slogan system more systematic | 
and promising than those now in vogue. Indeed, as Komisar has 
recently suggested, “This book represents an almost classic case of 
the natural history of a slogan system. . . . If Mr. Bruner and his 
associates can provide the necessary interpretations of the slogan 
system when called on, they may perform a most valuable funetion, 
perhaps providing at least a linguistic basis for overcoming the 
tragic fragmentation between academics and educationists.” + 
ARTHUR P. Coxaparcl 
Stanford. University 


Using Tests in Counseling by Leo Goldman. New York: Appleton- 
Century-Crofts, Inc., 1961. Pp. 434. $7.00 ; 

Are tests generally misused? Oyez, oyez! Is it better to do with- 
out tests than to misuse them? Oyez, oyez! Is it even better to leam 
how to use them correctly? Oyez, oyez! 

Goldman lays the proposition squarely on the line. Let us coun 
seling psychologists put up or shut up about tests. Why? For the 
protection of the community against blatant malpractice. But are n 
tests so crude that professional training is unnecessary? ‘Then 800 
instruments have no place in an assessment and counseling program: | 

Goldman's alternative to wholesale test destruction is, of cours 
this book. His goal is to develop skill in the interpretation of test Ti 
individual counseling through a synthesis of statistical and clinica 
approaches. His designated audiences are practicing guidance ваш, 
selors and counseling psychologists in schools, colleges, the VÀ; ®" 


1 Komisar, B. Paul. “The Logic of Slogans.” In B.O. Smith and R. Н. Enn 
Language and Concepts in Education. Rand-McNally, 1960, Chapter 13- 


NO 


BOOK REVIEWS 235 


other agencies. The size of the potential audience may be diminished; — 
however, if it is limited to those who have what the author deems to 
be a minimal level of professional education, namely, a foundation 
in relevant areas of psychology and sociology, a background in ele- 
mentary statistics, and an understanding of the basic principles of 
tests and measurement, 

The basic model of client-counselor interaction is set within a 
framework of decision-theory similar to that proposed by Cronbach. 
The purpose and major implications of this model appear to be 
somewhat as follows: 


The purpose of testing is to aid decision-making by providing in- 
formation which is valid for the action in question. Although the 
client is the one who decides what outcomes or goals are to be con- 
sidered, the counselor has the knowledge of validities which is 
heeded to decide whether the desired information can be obtained 
from tests or other sources. When a test has been prescribed, then, 
itis assumed that the counselor has decided that existing sources of 
information are inadequate and that the test has something to add. 
Je conventional test validity coefficient does not answer the ques- 
tion of practical utility since it reports only improvement of de- 
“sions over chance, Therefore, before prescribing testing, the coun- 
8010г must somehow determine validity coefficients for decisions 
made without tests, compare them with conventional test coeffi- 
ents, and decide that the test is either a more valid source of the 
same information, or a valid source of new information, 

he problem is simplified considerably, of course, if one assumes 
that the client and counselor begin with zero information. Under 
ie, 2 dition, the use of any test with a nonchance validity coeffi 
“ent is justified, However, one can just as easily make a contrary 
assumption, namely, that self-determined, nontest decisions are 

ely to have validity coefficients at least as large as those of the 
Average test, Goldman takes a third position, namely, that coun- 
selorg can develop the ability to “synthesize” clinical hunches and 
statistica] inferences, The reviewer's point is simply that by judi- 
E" Selection of unsupported basic assumptions, one can justify 

Program, 

It one i; comfortable with Goldman's basic assumptions, however, 
this | 90k has much to offer, Topics covered include test selection, 
administration, scoring, interpretation, and communication of in- 

TDretations An attempt is made to base conclusions on research 
p ‘hee wherever possible, This is the major strength of the book, 
nd it Produces a bibliography of about 350 items. h 

hapters 3 and 4 discuss test selection as an integral part of the 
изеп process, Client participation is favored on the grounds 


[ 
That i increases involvement, decreases dependence, provides addi- 


236 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tional diagnostic data and decision-making experience, and de- 
creases resistance to later test interpretations. This position is based 
primarily on theory and logie, since research evidence is limited and 
inconclusive. It is qualified by noting that it may be incompatible 
with some counselor styles, some counseling settings, and some 
clients. 

A concomitant effect of client participation in test selection is that 
it tends to make explicit the real limitations of the counselors 
knowledge. For example, here is what an honest counselor might say 
to a client concerned about success in college: “Verbal reasoning 
ability is important for success in college, but we don’t know exactly 
the minimum amount needed, nor can we tell you how much of a 
lack of it can be compensated for by effort and time spent in studies. 
If your score is very high, that’s a positive sign; if it’s very low, 
that’s a warning signal; but between the extremes your guess may 
be as good as mine" (p. 52). 

Since this kind of frankness does not often appear in counseling 
protocols, it might be hypothesized that counselors have both a 
strong commitment to the use of tests and also an unwillingness to 
make explieit the sparseness of the validity data upon which their 
test interpretations are based. Whether this is seen as desirable or 
not, if true, would depend on one's counseling philosophy. 

Chapters 5 and 6 consider problems of test administration and 
scoring. They serve as a competent review of the gross and subtle 
influences that ereep in between the cognitive processes of test selec- 
tion and interpretation. "They also remind us that for many of these 
influences there is little we can do at present beyond becoming in- 
creasingly aware of the proper cautions that must be taken in stat- 
ing for what a test score really stands. 

Test interpretation is approached from two points of view, the 
Statistical and the elinieal. Goldman uses a "bridge" analogy t0 
show that the basie model in both cases is the same. That is, We 
stand on one side of a river trying to explore some of the territory 
on the other side so that a choice can be made as to which territory 
one wishes to live in. The statistical and clinical processes, then, 
provide the “bridge” between the two points, Unfortunately, the 
bridges not only fail to reach, but also appear to be made of insub- 
stantial materials. 

After reviewing the literature on the two processes, however 
Goldman coneludes that statistica] approaches offer hope for the 
future, since they have been proved superior to clinical prediction 
in situations where the two methods could be compared. Since few 
statistical cookbooks are presently available, however, the practic- 
ing counselor must use clinical methods for some time to come; 
either through choice or as a make-shift, Therefore, it behooves him 
to improve his clinical skills. 


BOOK REVIEWS 237 


To this end, Goldman presents a list of Suggestions for optimising 
effectiveness with clinical interpretive methods, plus a long section 
of illustrative cases in which the clinical and statistical approaches 
are compared and combined. The final three chapters are used to 
describe principles and methods of reporting test results to clients 
and other interested persons. 

To sum up, Goldman approaches the question of how to use tests 
in individual counseling by first raising the crucial issues that are 
involved and then bringing to bear relevant research studies. The 
author has done a commendable job in both instances. In the 
Process, however, he has raised many troublesome questions which 
heed to be answered before one can claim the status of expert as 
Well as the title. 

JOHN L. Rinn 
University of California, Berkeley 


The Recruitment and Training of Teacher Interns—A Report of the 
Southern California Teacher Education Project by C. Edward 
Meyers, Wendell E. Cannon, and D. Welty Lefever. Los 
Angeles: University of Southern California, School of Education, 
1960. Pp. 150 + Appendix. Н 
his monograph reports on what is said to be the largest of “all 

the experimental programs designed to recruit and prepare teachers 

tom the adult population.” It was made possible through the 

Cooperation of certain publie school districts and a grant from the 

und for the Advancement of Education. À 
The subjects of this study, numbering 341, received scholarships 
or the major portion of tuition and fees and met the following 

requirements: (1) receipt of a baccalaureate degree at least two 

Years previously; (2) acceptable to a cooperating school district for 

“ployment as a regular classroom teacher on a provisional cre- 

ential in September; (3) acceptable to the University as a graduate 

Professional] student and a eredential candidate. 
ach of the subjects (except for drop-outs) completed a ten-week 

шег session in the University, taught in the publie schools on à 

Provisional credential for one year during which time Saturday 

purses Were required on campus, and finished the requirements for 

4 © regular credential at the end of the second or following summer 

“sion. The Same professional program was required of this group 

Mex Tequired of the regular campus students with two Fic 

^ the order of course presentation was somewhat cele ri : 

P 4 the one-year teaching experience on a provisional bue bw 

i `8 substituted for half the student teaching requiremen | 
tial plan called for ten semester credits of work in or nds 

inp the ¢ 270% and development, and educational роону ur 

E the first summer session; the Saturday seminar required during 


2388 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the year's teaching consisted of the second half of the course in 
methods and eurrieulum and the advanced course in elementary 
education paralleled by courses in audio-visual techniques and in 
musie methods; the second summer session was to include a six- 
weeks student teaching assignment supplemented by a course in art 
methods and an advanced course in educational sociology. 

"This, in general, represents the general nature of the program 
which was in effect for one year at which time certain modifications 
were made: three weeks of student teaching replaced a portion of 
psychology during the first summer session and the teaching during 
the second summer session was reduced correspondingly. 

It should be pointed out that the project plan has been continued 

by the University since the completion of the Project, with certain 
changes: the program operates on a nonscholarship basis and full 
six weeks of student teaching are experienced during the first 
summer, 
_ Although the Project was successful in recruiting and in provid- 
ing accelerated training for a selected group, it was also designed 
to perform a research function: to appraise, in a sense as a con- 
trolled study, “the processes and products of what is often called 
the internship method of teacher preparation.” 

Chapters II and III provide information on the selection of the 
project candidates and the control groups (teachers prepared in 
regular teacher education programs who were teaching, for the most 
part, in the same districts as were project trainees); the latter 
chapter reports on certain characteristics of project teachers as they 
were assessed and compared with undergraduate elementary cre- 
dential candidates in the University’s regular teacher education 
program. 

Two basic kinds of info 


rmation are presented in Chapter IV: 
data on the credentials ea ; Г 


: ls earned, the withdrawals, and the failures; 
the survival of candidates into and beyond their first year of 
teaching. 


In Chapters V and VI, the 
the control teachers as judged 
other sources. These data are e 
of the study and, in brief sum 


project teachers are compared with 
by principals’ ratings and various 
onsidered perhaps the most crucial 
dy mary, indicate that the differences 
were negligible. Chapters VII and VIII offer appraisal of the 
Project from the point of view of the trainees and from that of the 
project staff and others who worked with them. 

The concluding chapter, IX, contains a summary of the study, its 
conclusions, and the implications derived for future recruitment an 


preparation of teachers by the internship method. The point is made 
that the term "internship" is a somewhat, loose term variously used 
but, as defined in this study, it implies “that most, essential teach- 
ing skills were learned ‘on the job’ under the joint direction of 


BOOK REVIEWS 939 


campus and district personnel." The detailed findings of the study 
are many ; it is the work of vast numbers of University and school- 
district personnel over a four-year period; it involved a most care- 
fully detailed research plan and highly skilled data evaluation; and 
it offers to all who are interested in elementary teacher education a 
competent source of reference, comparison, and suggestion. The 
University personnel as well as the personnel in the cooperating 
school districts are to be commended for their initiation of the 
Project, its administration, and this clearly presented report. 
Donoruy M. COLLETT 
La Verne College 
California 


Student Personnel Services in Colleges and Universities by E. G. 
р" New York: MeGraw-Hill Book Company, 1961. 
7.50 
Though Williamson states that this is not a “how-to-do” hand- 
book for deans of students and other administrators, the book does 
describe the operation of student personnel services. These opera- 
tional descriptions, however, serve as illustrations for the basic 
Premise that pervades the book: Student personnel services provide 
а framework about which an integrated experience in living at col- 
еде can be established, The total collegiate experience is exciting 
and stimulating, Student personnel services help to establish an 
educative relationship with all the facets of institutional life: class- 
room, laboratory, fraternity, ete. Р 3 oh 
In order to be more explicit, Williamson divides his premise into 
Ve aspects: (a) development of all aspects of human individuality; 
) unique individuality of each student; (c) teaching in the class- 
10070 is not enough or sufficient in the education of some students; 
use of methods and relationships of an educative rather than an 
authoritarian or chain-of-command type; and (e) incorporation 
Into student personnel services of new knowledge of human nature 
and its development, Though the personnel worker provides techni- 
tal services, he should not perform a service per 86; the service 
Bod be used to help students to full maturity. V RE the 
ervices should be coordinated on an institution-wide basis. 
€ book is divided into three sections. Part, One, “Services and 
Administrative Process,” sets the stage for the other two Кене 
t Provides the educational, psychological, historical, and socis 
asis for a student personnel philosophy. This section contains "i 
cellent: review of administrative theory. Part Two, “Some pucr 
inistrative Processes," deals with discipline, counseling, an 
* consulting roles of student personnel workers. These areas are 
examined with the idea of how to aid a student to reach full ma- 
nty. Part Three, “New Services and Policies: Illustrative Cases, 


ъ, 


?40 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


focuses on a series of philosophical foundations for certain emerging 
new forms of student personnel services. Among the topies discussed 
and illustrations given are: (a) making rules and policies; (b) 
speaker policy and academic freedom; and (c) extracurriculum as 
higher education, 

Though Williamson’s main theme “constitutes a reaction against 
the development of the intellect as the dominant concern of higher 
education,” it is in reality a plea for an integrated approach in 
higher education. In essence, the entire university should serve as a 
classroom. To some this may be a revolutionary idea. However, 


HENRY KACZKOWSKI 
University of Wisconsin, Milwaukee 


pressed by the philosophical 
the ethical framework indi- 


cated so clearly as Peck advances his motivational theory of char- 


acter and as the five character-type dimensions are defined (Amoral, 
Expedient, Conforming, Trrational-Conscientious, and Rational- 
Altruistic). Peck’s ethical value system is revealed throughout this 
ullest form in the last chapter, “Some 


C = 


BOOK REVIEWS 241 


Study, has its genesis as far back as 1940 in plans for a longi- 
inal study of child development in a Midwestern community. In 
1, “Prairie City" was selected; World War II intervened and 
longitudinal-study idea was modified into a series of “more or 
related investigations of various aspects of child development.’ 
umber of publications have resulted from the investigations 1 
Prairie City, and a number of small studies were planned and e 
cuted. Two groups of children were chosen for intensive investi 
tion, those born in 1926 and those born in 1933. A wealth of а а 
had been accumulated on the 1933 group and it was chosen as the 
oup for further investigation relative to moral character. he 
reader of this volume has an opportunity, in the Preface, to lea 
something of the nature of the almost monumental task which wa: 
begun late in the year 1948, This task can best be expressed by 
Peck: 

In summary, this book represents the end product of years of 
exploratory study, largely inductive, by a good many people. 
If it now appears to present a simple theory, in reasonably clear 
terms, with some provisional evidence to support it, this is a 
consummation for which we are profoundly grateful. To the de- 
gree this clarity is approached, success might be said to have 
crowned the sixteen years of exploring, in largely uncharted 
country, by a goodly company of inquiring minds. } 

Preface, p. xiv 


à 


It would be difficult, indeed, to deal adequately with a single as- 
Dect of this study, let alone attempt to do justice to all three: 
(1) the general format of presentation; (2) the philosophical frame- 
Work; and (3) the study’s design and methodology. 

e volume contains ten chapters preceded by the Preface and a 
N ote on Reading This Book, and followed by the Appendix, Bibli- 
graphy, and Index. - : 
— Tt is in the Preface that the reader has an opportunity to see this 
udy in its historical setting, beginning with the well-known Char- 
acter Education Inquiry under the direction of Hartshorne, May, 
And Shuttleworth; suggested reasons are offered for the differences 
: Bue between the Character ШШ) for example, and the 
Moral-Character Study presented in this volume. . к 
А Note on Reading Мз Book offers three major alternatives to 
Me reader: (1) begin by reading those chapters or chapter of spe- 
cial interest in terms of an aspect of character development; (2) 
tead the Appendix first for a comprehensive presentation of meth- 
Odology; (3) most meaningfully, perhaps, read the chapters in se- 
uence with reference to the Appendix. Regardless of the alternative 
ected, the reader is always fully supported with appropriate refer- 


s for clarification. 


22 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
As a very minimum, titles of the ten chapters should be listed: 


I. A Motivational Theory of Character 
II. The Setting, the Research Population, and the Research 
Procedure 
III. Case Studies of Three Character Types 
IV. Personality and Character 
V. Family Influences on Personality and Character 
VI. Moral Character and the Peer Group 
VII. Sources of Moral Values in the Social Environment 
VIII. The Consistency of Moral Character Through Time 
IX. Summary 
X. Some Implications and Prospects 


The Appendiz contains fifty-four pages of concise and tightly 
structured information dealing with the design and methodology of 
the study. It includes, for example, such aspects as the nature of the 
data, the rationale for the case-study method, definitions of the con- 
structed personality “traits,” a discussion dealing with the “Ra- 
tionale, Reliability, and Validity of the Ratings,” and the statistical 
analysis of the data (direct factor analysis with a cross check via 
obverse factor-analysis). 

All who feel or discharge some responsibility for the character 
edueation of youth would do well to become familiar with this vol- 
ume; the study raises many provocative questions and deals with 
many controversial issues. The Summary, for example, looks at the 
Nature of Character, the Psychogenesis of Character, Character 
and the Culture, Family Influences on Character Formation, Char- 
acter and the Peer Group, and Community Influences on Character. 

The implications, pointed up in Chapter X, are far from limited. 
Peck takes a long, hard look at the world horizon, at human history, 
and at “the myth of inevitable ‘progress’ ”; he looks at MAN who 
perpetrated, or permitted, in this century, “the most terrible orgy of 
sadism and mass murder the world has ever seen." 

_It is with respect to the last chapters of this volume that this re- 
viewer feels most inept. It would, perhaps, be well to conclude this 
review with a few comments from the last chapter in hopes that 
they will give some evidence of the fullness and richness of Peck’s 
philosophical framework: 


Children do as we do, not as we say. Their character tends to 
be an accurate reflection of the way their parents act tow: 
them, no matter what contrary pretenses some parents try to 
present to society. There is no way for parents to explain away 
or to give away this responsibility ; it is a simple inexorable fact. 

Specifically, it appears that if character is really as impor- 
tant to us Americans as we say it is, then there should be 


| 


BOOK REVIEWS 243 


rigorous, alert recruiting and selection of teachers and other 
youth leaders on grounds of maturity of personality and char- 
acter. Their own natures are going to influence children much 
more than any verbal information they convey. 
` Yet a great difference remains between the purposes, attitudes, 
and actions of the adult who wants to control children for 
authoritarian purposes, and the adult who judiciously measures 
out control mixed with gradually increasing freedom, in a de- 
liberate effort to help and to allow children to grow up into ma- 
ture, wisely self-guiding people, whose moral behavior is in- 
ternally motivated and foresightedly intelligent. If this kind of 
maturity of character is what we want for our children, the ways 
to get there may be hard but they are clearly marked. $ 
The millenium in human ethics will not come by wishing; it 
is a far-distant goal that must be arduously worked for, on 4 


personal scale, day to day. 
Dororay M. COLLETT 


La Verne College 
California 


Management in Marketing: Text and Cases by Hector Lazo and 
Arnold Corbin, New York, McGraw-Hill Book Company, 1961 
Pp. 656. $8.50 
This is a book primarily for managers and about management. Its 

field is marketing, and its thesis is the “new” marketing concept: 

that marketing management is à business function equally as im- 

Portant, in today’s "consumer" economy, 8$ the traditional func- 

tions of production and finance. Ў 

.But this specialized approach does not reduce the book's con- 
siderable value to persons other than marketing managers. an- 
agers in nearly any organization will find partieularly interesting 
and useful the authors’ recognition of the need for planning the en- 
tire activities of an enterprise in terms of the purpose for which it 

exists; namely, in a business enterprise, the satisfaction of d 

needs and wants, Research workers, especially in the industria ы 

Psychological fields, will appreciate the book's thorough treatmen 

of the insatiable appetite of today's business for facts: 


. The basic fact has come to be more and more accepted bv 
in today's competitive and complex business structure, nO RS 
can survive for long if it depends solely on hunches, V UA НЕ 
decisions based on personal experience and np 'ecastini 
marketing management, the importance placed on Sed in Es 
and on knowledge of the market before action 15 D k ü fh. 
field naturally means a much greater reliance on mar "i d 
search in all its phases. . . - Fact finding with respect to à 


244 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ments of the market and the means for satisfying that market, 
including greater use of the social, behavioral, and mathematical 
sciences in establishing these facts, becomes the core of market- 
ing activity. 


The authors’ main themes are these: 


1. All activities of a company not strictly financial, manu- 
facturing, or technical should be integrated under a single 
marketing head and oriented toward the customer and the 
customer’s needs. 

2. “Marketing,” under this definition, therefore includes all or 
most of the following: sales, advertising, product planning, 
marketing research, product service, promotion, transporta- 
tion, warehousing, inventory controls, and delivery. 

3. The marketing function should have equal status with those 
of production and finance, and the needs of the market—i.e., 
of customer satisfaction—should be a primary consideration 
in top management planning and policy. 

4. Business should direct modern technology’s streamlining and 
forecasting capacities toward reducing marketing costs and 
increasing distributive efficiency, as it has already done in the 
spheres of production and finance. (The average annual in- 
crease in productivity has been 3%, while that of distribution 
has been only 1%.) 


Lazo and Corbin are not the first to contend that company think- 
ing should emphasize “profit rather than volume and successful 
selling rather than effective manufacturing.” But they are the first 
to marshal the arguments for the new marketing concept in such 
comprehensive, convincing, and serviceable array. Their thorough 
analysis of today’s vastly expanded and highly competitive “con- 
sumer” economy impels the reader to investigate and to consider 
carefully their suggestions for coping with the problems created by 
that economy. Managers and researchers will find this book both à 
rationale for the planned, integrated approach to organizational 
problems and a valuable guide to the accomplishment of organiza- 
tional goals, 

J. Н. RAINWATER, Jn. ; 
Los Angeles County Civil Service Commission 


From Adolescent to Adult by Percival M. Symonds and Arthur R- 
Jensen. New York: Columbia University Press, 1961. Pp. viii + 
413. 

In a followup study of forty adolescents, 28 were reached thirteen 
years later. These subjects were given the Rorschach, the Symonds 

Picture-Story Test, and interviewed twice by Dr. Arthur Jensen 


BOOK REVIEWS 245 


The study was intended to provide (a) an indication of the pre- 
dietability of the adolescent data, (b) some indication of the value 
of fantasy material, and (c) information concerning the process of 
emerging from adolescence into adulthood. Subjects were regarded 
аз а normal group of urban dwellers. 

From the material obtained, there appeared to be marked per- 
sistence of themes in fantasy over the thirteen-year period. Themes 
of depression and wishful thinking increased whereas others de- 
creased. Adolescent fantasy frequently became openly expressed in 
behavior or personality. This was noted with respect to sex attitudes, 
self-confidence and anxiety, as well as other areas. 

The evidence for these observations is principally inferential. De- 
tailed statistics are not presented and the quantitative analyses re- 
ported are not complete. No control group was used, and the ex- 
perimental design is inadequate as the authors acknowledge. 
Although clinicians may enjoy reading the case study data presented, 
statisticians will be inclined to east aside this work. To demonstrate 
the predictability of fantasy material, the combined adolescent and 
combined adult stories of six individuals were given to three assist- 
ants to be matched. For each individual, there were ten stories from 
the 1940 series stapled together and ten stories from the 1953 series. 
One assistant matched all six successfully and the other two had 
only one shift in position." No statistical data are presented to 
indicate the p level or the statistical significance of these results. 

he data are interpreted as “indicating that the total personality as 
shown by the stories retained individual identity with remarkable 
fidelity.” Actually, the identity often was established by the judges 
on the basis of a peculiar idiom or method of expression rather than 
general personality characteristics. mI 

8 the authors feel they have established a relationship between 
thanges in fantasy with changes in experience, eight expository 
Principles are enunciated, One example: “If a tendency is expressed 
n Teality the opposite active trend may appear in fantasy or the 
Opposite passive trend may disappear from fantasy.” Unfortunately, 
such ambiguous verbiage is found too frequently in the report. 

Since the book focuses on fantasy material, it might be ee nis 
P'opriately labeled from Adolescent Fantasy to Adult Fantasy. Ine 

alue of fantasy production for understanding an individual E. 
sted by the case material presented, but conclusive evidence is 
„йй, The data may stimulate a multiplicity of reagan Д x 
a but they do not yield scientifically acceptable evidene it 

tawing conclusions, As an exploratory study, the work has aet 
E Naive students will have to be cautioned that the data presen 

Te seldom adequate for reaching the conclusions that are stated. 
WiLLIAM COLEMAN 
Houghton Mifflin (California) 


246 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Psychology: An Introduction to the Study of Human Behavior by 
Henry Clay Lindren and Donn Byrne. New York and London: 
John Wiley & Sons, 1960. Pp. 429. 

The authors appear to have met the basic assumption of com- 
munication with beginning students in psychology by means of a 
very attractive and readable text. The organization of the book is 
excellent in appealing to the more common interests in the opening 
chapters and progressing to more detailed less known aspects in the 
later chapters. The widespread use of drawings, pictures, graphs, and 
charts are meaningful and at times add an air of humor often lack- 
ing in texts. Chapters 8 and 9 are devoted to the discussion and 
measurement of individual differences. Specifically, Chapter 8 is 
concerned with statistics. The figures showing normal and skewed 
distributions are excellent in both readability and numbers. The 
formula for the standard deviation is one of the less ominous and 
should delight students even if somewhat criticized by the authori- 
ties. The correlation coefficient is explained simply and to the point 
—using scatter diagrams to best advantage. Probability and levels 
of significance are also very readable for the beginning student. 
Validity and reliability are equally well handled. 

_ Chapter 9 deals with intelligence and creativity. The section on 

intelligence is adequate. Creativity is very illusive to measurement 

and while the short space devoted here is a noteworthy beginning, 
creativity still is an area sadly lacking in both research and writings. 

The book is a fresh approach to the introductory offering in psy- 
chology—it seems to take the middle road between the austere, 
highly conceptualized offerings of traditional psychology and the 
more human approach of educational psychology. 

There will be some lamentation on the lack of both depth and 
breadth in the areas of physiological and perceptual processes, but 
these important and highly specialized sections might better follow 
in more advanced courses. All in all, this is a very readable book 
that will gain much favor in its new approach, 

Roy M. Еітон 
San Fernando Valley State College 


The Meaning and Measurement of Neuroticism and Anxiety by 
Raymond B. Cattell and Ivan Н. Scheier. New York: The Ron- 
ald Press Company, 1961. Pp. viii + 535. П 

Cattell апа Scheier's The M. eaning and Measurement of Neuroti- 
cism and Anxiety is one of a series of books that has emerged from 
about fifteen years of research at the Laboratory of Personality 

Assessment and Group Behavior of the University of Illinois. It 8 

intended for two types of readers: 1) for the skilled practitioner 1n 

psychiatry, applied psychology, and general medicine, and 2) for 
the empirically-oriented clinical researcher. The authors have ret- 


BOOK REVIEWS 247 


ized the difficulties of addressing themselves to these two diverse 
diences and accordingly suggest that Chapters 3, 4, 5, 6, 14 and 
5 of their 15 chapter volume might be of primary interest to the 
actitioner; and that Chapters 8, 9, 10, 11, 12 and 13 might be of 
more interest to the researcher. The remaining three chapters are 
intended apparently for both types of readers. 
The opening two chapters contain a discussion of “pre-metric” 
theory and both the practitioner and the researcher should find this 
an interesting commentary on the present state of knowledge in the 
mental health field. The authors’ diagnosis of “pre-metrie” theory 
is that it is “only a first approximation to the truth which is doomed 
to arrest at the qualitative level.” Their prescription for cure is: 1) 
Precise, standard and clinically meaningful tests and 2) multivari- 
ate statistical techniques. Concepts of anxiety and neurosis, which 
have their roots in Freudian, Neo-Freudian, Adlerian and Jungian 
Writings are reviewed and are adopted as valuable source dimen- 
sions. These concepts are reformulated into hypotheses which are 
then examined metrically throughout the remainder of the book. 
An excellent and readable introduction to factor analysis and 
Multivariate statistical techniques is offered in Chapter 3; but 
beyond this chapter the practitioner of psychiatry and general 
Medicine, no matter how skilled, and for whom the next three chap- 
ters are intended, will find the reading difficult. The practitioner, 
unless he is already thoroughly familiar with previous published 
Material stemming from work done at the Laboratory of Person- 
ality Assessment, will need to master a new language. He will be 
introduced to Q and L factors, first and second order factors, type 
and trait definitions, T data, conspect reliability, harrie png er 
Tess and premsia, comention and abcultion, eorticalertia апі 
pathemia, zeppia, invia and exvia, and autia. " 
_, Compared to Chapters 4, 5, and 6, the seventh chapter is E 
tionally lucid. Once again the practitioner is brought back to ns 
Miliar ground when the authors discuss individual differences in ve 
Modes of expression of neurosis and anxiety. Contributing Po 
cidity is the report of results obtained from research done ue 
Neurotic groups of individuals who were diagnosed along the { 9 
iliar Kraeplinian syndrome dimensions. Unfortunately the pont 
mation describing the selection criteria used to differentiate there 
Froups is incomplete, and admittedly there was а ier eu 
bility” of contamination of clinical judgment by the 16 P.F. Tes 
With which the authors hoped to compare clinical mem iud 
€ clinica] researcher, who is sympathetic to Ca "dl 
Scheier’s brand of factorial design and who is not easily egi 
the authors’ occasional lapses into word salad, will oe = 
An abundance of research material in the first seven ¢ ity 
Oreover, in the next six chapters he will be given the opportun 


248 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 


to experience the authors’ thinking-through of the methodological 
and theoretical ramifications of such knotty problems as the psy- 
chophysiologieal interactions in anxiety and neurosis; and the influ- 
ences of short-term environmental conditions, intra-organismie de- 
terminants, and total eultural patterns on the formation of anxiety 
and neurosis. Also the researcher should find of considerable interest 
the glimpse given him of the authors’ tentative multifactorial the- 
oretical system (Chapter 12), and the research program, consisting 
of some nine problem areas, which is mapped out for him (Chapter 
13). 

Addressing themselves primarily to the practitioner, the authors 
devote Chapters 14 and 15 to a discussion of the strategic use and 
availability of measuring instruments in clinical diagnosis (“Diag- 
nostie Practice, 1970") and therapy. In Chapter 14, after stating 
their case for the use of factorially defined and objective measure- 
ments, Cattell and Scheier list about ten of the available tests and 
suggest how these might be used in various phases of diagnostic and 
therapeutic work. As might be expected, the tests are all from the 
Institute for Personality and Ability Testing (IPAT), and perhaps 
for this reason are treated uncritically by the authors. For example, 
they refer to the 16 P.F. Test as having been “proved potent as а 
predictor in clinical diagnosis, occupational adjustment, and scho- 
lastic achievement” (p. 433). When one examines the references 
they cite in substantiation of this “proof” (Handbook for the Siz- 
teen Personality Factor Questionnaire, 1957; and IPAT Staff: The 
IPAT Information Bulletin Series, Bulletin Nos. 1, 3, and 6, 1960), 
one is impressed only with the lack of satisfactory normative infor- 
mation and the low reliability coefficients (.45 to .55) for the scales. 
After presenting a chain of unimpressive arguments in which the 
authors explain away the need for large normative samples and 
substantial reliability coefficients, they admit that “попе of the 
above is meant to imply that all tests to be described are in final, 
fully polished form.” Indeed they are far from this form, and their 
realization of this probably leads Cattell and Scheier to recommend 
clinical research use of their tests rather than routine practice. 

Two appendices are provided for the reader: one contains 38 
brief sketches and comments on the major research data presented 
in the book, some of which have not yet been published; and the 
other appendix is a 157-word glossary. The research appendix does 
not make clear how many times identical data, or portions of the 
same data, are analyzed and reanalyzed; but it is obvious that there 
is overlap of research samples. This is an unsatisfactory state of 
affairs, since one study is frequently cited as supporting or confirm- 
ing another one. The glossary should be extremely helpful to the 
reader who is not familiar with Cattell’s prior publications. A bibli- 
ography containing 236 references, of which approximately 60 refer- 


| 


BOOK REVIEWS 249 


ences are to Cattell and his associates’ work, follows the ap- 
pendices. A name index and a subject index complete the volume. 
The main value of this book lies in its integration and attempt at 

systematization of much of the factorial research on anxiety and 
neuroticism that has been done at the Laboratory of Personality 
Assessment and Group Behavior. The practitioner who is looking 
“for an escape from much of the guesswork which now darkens his 
practice” (p. iii), and who hopes to obtain from this book clarifica- 
tion on the meaning of neuroticism and anxiety, will be confused 
and disappointed. Readers who are committed to factor analysis as 
à personality research technique, and individuals who are seeking 
à hard-headed, but conservative approach to clinical research will 
get excellent mileage from The Meaning and Measurement of 
Neuroticism and Anxiety and should find it a treasure of stimulat- 
Ing source material. 

BENJAMIN KLEINMUNTZ 

Carnegie Institute of Technology 

Pittsburgh, Pennsylvania 


Personality Assessment and Diagnosis by Edward Bennett. New 
York: The Ronald Press Company, 1961. Pp. viii + 287. $8.50 
In laying the foundations for a new approach to personality as- 
sessment, Bennett begins by reviewing personality theory and the 
causes of maladjustment and abnormality. While the content of his 
position follows generally accepted dynamic formulations, the 
method of presentation, or schematization, is unusual in that it 
takes the form of a modified computer programming flow diagram. 
This should prove to be an interesting and effective pedagogic tech- 
mique, for it enables one to see the “branchings” that can occur as à 
result of different developmental tendencies. ; 

The polydiagnostic technique, as Bennett calls it, is based on the 
Assumption that people can express their feelings and subjective 
perience in a fairly direct manner if they are given a minimum 
structure and help. The structure consists of providing the subject 
With à question and a list of fifteen terms from which he is to select 
three that provide the best answer. The question is repeated a 
“cond, third, and fourth time; with each repetition the СЫ 
" ects an additional three answers from the list. After the pin 
зы only the three terms whieh the subject feels are leas 

lcable to the question are left. | 

here are чибо such sets of fifteen terms. The areas nee 
d Procedures attempt to measure are TE "(1 feelings t 
m Self in general, (2) wishes and desires, (3) Edge itn 
fa at Ways of life are valuable as means toward happiness an goes 

ction, and (4) feelings about other people in general. (p. 73) 

€s of feclings are associated with each term. These are repre- 


250 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


sented by the symbols ++, +, 0, —, ——. The strongest positive 
feeling (+--+) is expressed by the first round of three selections; 
the next round of three selections is scored (++), the next (0), the 
next (—), and the remaining three selections are scored with the 
double minus (——). For questions that require the subject to select 
the terms in order of least applicability, the (——) symbol is associ- 
ated with the initial choice of three. 

This represents the method of polydiagnostic personality assess- 
ment as proposed by Bennett. The reader will recognize the close 
similarity of this technique to the rating scale and the adjective 
check list. There are also obvious differences. It should be noted 
that what is being proposed is not a personality test but a method 
for obtaining quantifiable measures of subjective feelings. The 
question remains—what are the advantages of this technique? The 
author illustrates its value by means of case histories. These studies 
show that in the hands of a skilled technician the polydiagnostie 
technique provides a great deal of information on the subject's 
adjustment, confliets, and anxieties. How valid are these interpre- 
tations? Well, no validity coefficients are presented, and possibly it 
is unfair to expect such data at this time, If the technique is used 
simply to gather data for later diseussion in an interview situation, 
validity statisties may not be required. Bennett, in fact, suggests 
this use. He also believes that the quantification of data on sub- 
jective feelings, obtainable by this technique, would constitute a 
bridge between the requirements of experimental and clinical psy- 
chology. 

However, more than a technique is needed to construct a bridge; 
first, many workers in the field must use the instrument and gather 
the data with which to build the span. At the present time all that 
can be said is that in Personality Assessment and Diagnosis Edward 
Bennett describes an original procedure for eliciting subjective feel- 
ings about the self and society and of scoring these feelings on à 
five-point scale by means of forced-choice judgments. The tech- 
nique appears to have merit and should be examined by both ex- 
perimental and clinical psychologists, for only through their use 
can its true value be determined. 

HAROLD Bonko ч 
System Development Corporation 


Essai d'adaptation de l'échelle d'intelligence pour enfants de р. 
Wechsler (W.LS.C.) à des écoliers belges d'expression frangaise 
(Preliminary adaptation of the WISC to French-speaking B vi 
gian school children) by Raymonde Berte. Louvain, Belgium: 
Document #8 of the Centre National de Recherches de P “ү 
chotechnique Scolaire (Psychologie appliquée à Education)» 


BOOK REVIEWS 251 


; pp. 64. The published shorter version of a doctoral disserta- 
‚ same title, 1960, pp. viii + 248. 
since the original loan of an individual intelligence scale, 
d some fifty years ago by the Frenchman Binet and the 
Goddard, American psychometricians have been busily 
back the debt to their European colleagues. In France, for 
iple, the most popular instrument for the global assessment of 
ectual skills seems to be a translation of the 1937 Stanford Re- 
known there as the *"Terman-Merrill." The latest installment 
continuing exchange is represented here in the work of Dr. 
a faculty member of the Institut de Psychologie Apliquée et 
ogie at the University of Louvain, Belgium. Almost single- 
y she has performed the immense task of translating, stand- 
ing, and statistically analyzing the data from the Wechsler 
igence Scale for Children (Psychological Corporation, 1949) 
е French-speaking school population of bilingual Belgium. 
not so polished as the products of relatively high-powered 
lean testing concerns, Dr. Berte’s contribution is most com- 
nt and, certainly, most welcome. 
first chapter of this two-part monograph is introductory. It 

much in the manner of the original WISC Manual, a 

fication for the development of a clinical intelligence scale for 
lren, a discussion of the Wechslerian view of intelligence, а de- 
ion of the test’s format and administration, and a strong en- 
ent and explanation of the use of point scores and deviation 
8 às opposed to the mental and chronologieal age metrie. Tt 
udes with a brief summary of the American research literature 
h 1959 as it reflects on questions as to discrepancies between 
thal” and “performance” scale scores, factorial composition, 
ency of measurement, and validity. It is regrettable that the 
or was not able to draw upon the fine WISC review article of 
Il (Psychological Bulletin, 57, 2, 1960), as there is a dearth of 
lity study citations. Satisfaction with predictive and concurrent 
ез at various age, socio-economic, geographical, and дш, 
etting levels would seem a prerequisite to the enterprise 0 
the WISC to a different national culture. 
importantly, this reviewer would like to have seen в mo 
analysis of the scale in the construct validity sense. Wha 
ical stance informs Wechsler’s discussion of “general intel- 
as measured by his test? If general intelligence is a ira 
Aggregate of cognitive abilities and personality es M 
аге the relevant variables, and what are their fune um 
hips? What is the position of general intelligence n ds 
of substantial theoretical agreement in репор d 
«developmental, learning, perception, or cognition? M у 

does test behavior, as mediated by individual items, refec 


252 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


these theoretical positions? What correlational, or experimental, 
variable-manipulating predictions would one hazard in the investi- 
gation of the test’s properties? The import of these and similar ques- 
tions should not weigh on the present author alone; indeed, they 
should prove embarrassing to a wide range of psychological re- 
searchers and theorists, Binet was vitally interested in theoretical 
questions. Why have we gladly accepted his technique but eschewed 
his systematic orientation? 

The concrete research contributions of this work are contained in 
the second chapter. Here the author describes the translation of the 
test, the standardization and scoring procedures, and the statistical 
analysis of the gathered data, all amply illustrated with tabular 
materials. In the adaptation the six performance scale subtests went 
unmodified. For the six verbal scale subtests it was found necessary 
to modify the scoring criteria for the Comprehension and Simi- 
larities subtests slightly and to transpose the Arithmetic subtest’s 
units of measurement into Belgian terms; otherwise, the items were 
translated directly into French. Most Vocabulary items underwent 
a straight-forward dictionary translation; the greatest change was 
the replacement of "hara-kiri" by the French suicide. A number 
of items of the Information subtest were modified in interesting 
ways: “Who wrote ‘Romeo and Juliet’?” becomes “Qui a ecrit ‘Le 
Cid"?"; “How far is it from New York to Chicago?” is transformed 
into “Quelle est la distance de Bruxelles à Ostende?”’! 

Apparently, relative item difficulty levels (reported in terms of 
Davis’ difficulty index) remained fairly consistant for most sub- 
tests when taken over all age groups from 5 through 15 years. Sig- 
nificant reversals appeared on four subtests: Information, Vocabu- 
lary, Comprehension, and Similarities, with only slight modifications 
indicated for the latter two. On Information, however, 13 of 30 items 
shifted two or more ordinal positions. For example, while American 
children have relatively less difficulty in naming the color of rubies 
(position #12 vs. 3:17 Belgian), Belgian children more readily de- 
fine “barometer” (position 2:19 vs. #27 American). The difficulty 
levels of only three out of the last 30 vocabulary items remained un- 
seathed in the adaptation. As extreme examples, “nitroglycerine 
became 13 ordinal positions more difficult, while “catacomb” be- 
came easier by six. These new difficulty indices were computed from 
the data of the norming sample. ; 

A particularly happy aspect of this work was the standardization 
procedure employed by Dr. Berte, The sample consisted of 600 sub- 
jects, 50 at each age level from 7 through 15 years (25 boys and 
25 girls) plus a total of 73 boys and 77 girls in the 5 and 6 year 
brackets. Because of the fine representation achieved in the sample, 
a minimum of 50 subjects at each age level seems adequajte. Im- 
probable as it may seem, all testing was carried out by thé author 


BOOK REVIEWS 258 


and a single assistant! Each subject was tested in his own school 
under more or less standard conditions and within +114 months of 
the mid-point between two birthdays. The sample was stratified 
according to population percentages derived from a ten-year-old 
national census. Variables included geographical zones (four prov- 
inces and two isolated French-speaking townships), rural-urban 

districts (cities, towns, and villages), socio-economic level (eight 
- categories based on parental occupation), and type of school system 
(state versus parochial). The testing was carried out in a total of 
102 schools in 45 communities. As far as possible the sampling 
variables were controlled within each age group, as well as over the 
total sample; the author is to be commended for her painstaking 
efforts in achieving such a fine program. 

For each subtest at each age level, raw score distributions were 
- standardized and normalized by a “sta-eleven” transformation. 
While the American version employs a 21-interval transformation, 
the author’s choice of 11 intervals appears quite reasonable, given 
the small number of items and resulting moderate reliabilities of 
most subtests. Next, these transformed scores were summed over 
the appropriate 5 “verbal” subtests, 5 “performance” subtests, and 
totally to yield the usual three scales. Each of these three distribu- 
tions was then converted into a distribution with mean = 100 and 
| Standard deviation = 15 in the traditional manner. Going Wechsler 
one step further, the author supplies two I.Q. conversion tables, one 
or sealed scores in the 8-11 year range and one for the 5-7 and 
12-15 year groups, based on the finding of differential score vari- 
ances between these two categories of age groups. 

These manipulations lead to some interesting results. Although 
average 1.Q.’s at each of the 11 age levels approach closely to 100, 
two of three reported full-scale S.D.'s are greater than 17 points. 
Apparently reflecting the difficulty of the test as a whole, full-scale 
Q. score distributions are rather strongly skewed positively; of 
50 representative scores, 11.6% fall above + 1 8.D., 68.8% fall 
Within + 1 S.D., and 19.6% fall below —1 S.D. Theoretical normal 
 Moportions are 15.85%, 68.3%, and 15.85% , respectively. Finally, 
appears that petit Belgians are more gifted than petite Belgians. 
m of 11 age levels there is an average full-scale IQ. bera i 

fou PProximately 7.5 points; differences are nil at the "rn ks 
E age brackets, Unfortunately, in this case as in severa yr Pp 
f Shout the work, the author has not utilized — ^ i 
К variances, and proportions—relying instead on direc 

E her pee | pem 

nal step in the data analysis, we are в 
iundant]y illustrating subtest, verbal, performance, and ara 
m mal consistencies (via Cronbach’s coefficient alpha), stan 
Ors of measurement, and matrices of subtest and scale intercorre- 


ted with tables 


254 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


lations at three representative age levels. The “reliabilities” com- 
puted via coefficient alpha are generally below those reported by 
Wechsler, who utilized a split-half technique. They reach a fairly 
adequate high, however, of .89 for the full-seale, followed by .78 
and .85 for the performance and verbal scales. Test intercorrelation 
matrices follow the usual pattern, with Information and Vocabulary 
subtests showing the strongest relationships to the full-seale (r » 
.70), and performance tests, with the exclusion of Block Design, 
showing rather low degrees of relationship. Picture Completion 
seems almost worthless at most age levels, correlating weakly with 
individual subtests and with the three separate scales. 

With admirable detachment and undue modesty, Dr. Berte does 
not hesitate to point out the limitations of her study or of the in- 
strument itself. For this, as well as for her knowledgeable, energetic, 
and straight-forward attack on an arduous task, she deserves high 
credit and hearty congratulations. We join her in hoping that the 
mise au point of the WISC for Belgian students will serve, not only 
as a guide, but as a stimulus to greater psychometric research efforts 
on the part of her European colleagues, some of whom, we suspect, 
deny the programmatic validity of Thorndike’s classic “Whatever 
exists at all exists in some amount." 

ANDREW B. Cripmr* AND JULIAN C. STANLEY l 
Laboratory of Experimental Design 
University of Wisconsin 


Organization of Special Education for M entally Deficient Children 
by Rachel Gampert and Roger Girod. Geneva: International 
Bureau of Education, 1960. Рр. 272. A 

This monograph presents the results of a mail questionnaire on 
the organization of special education for educable “mentally de- 
ficient” children, completed by the principal educational agency in 
71 countries. It was undertaken by the Research Division of the 
International Bureau of Education. The scope of the study is in- 
dicated by the questionnaire which contained the following head- 
ings: I. Methods of Detection and Selection; IT. Mentally Deficient 
Children and Compulsory Education; III. Structure of the Special 
Education System for Mentally Deficient Children; IV. Methods 
of Education; V. Post-School Care; VI. Teaching Staff; VII. m 
ures Contemplated for the Next Few Years; VIII. Internationa 
Assistance. 

The main body of the monograph contains a comparative a 
sis by eountry of Tesponses to selected questions under the e 
topies listed above; the remainder of the monograph presents 4e- 
tailed responses by country to each question. 


* Now at the Department of Social Relations, Harvard University. 


BOOK REVIEWS 255 


wide variety of techniques used in selecting students for spe- 
is striking. Twenty-four of the responding countries had 
cilities for assessing students for special classes, and many did 
fier such classes. Most of the remaining countries used LQ. in 
nction with medical examinations, interviews, and observations 
nary tools for selecting students for special classes. Quite 
; the Binet, WISC, or one of their local counterparts was used 
ssing intellectual capacities, Unfortunately, little is known 
these translated tests. The shortage of professional personnel 
hd funds available for test development in many of these countries 
obably leaves many of these instruments open to question. Even in 
hore technologically advanced countries, test standardization 
| leaves much to be desired. The situation does not seem hopeful. 
у four countries made specific reference to test development and 
ement as areas of anticipated future work. 
1 countries mentioned research on the nature of mental de- 
cy or on educational methods as areas of future work. Many 
les, however, were concerned solely with developing opera- 
al educational programs. As a result, such problems as arousing 
10 awareness about mental deficiency, obtaining favorable legis- 
; and securing funds and trained personnel were seen as basic 
this time, We can expect a global frontal attack on mental re- 
lation only after these local problems have been solved. The 
lors of the monograph, here reviewed, have done an excellent 
of summarizing the educational developments to date. 
REGINALD L. JONES 
Miami University 


duction to Social Welfare (Second Edition) by Walter A. 
-Hedlander. Englewood Cliffs, New Jersey: Prentice-Hall, Ine., 
1. Pp. 589. M : 

ook of noteworthy value is this second edition by Dr. Fried- 
- The selection, assimilation, and organization of content re- 
g research and labor. He has made this material thoroughly 
ehensive and all-inclusive. Highly factual and lucid explana- 
Of the broad and diversified field of social welfare are brought 
Distinguished features of the book include the historical 
ction and underlying social philosophy behind the үш 
Practices, and legality of social welfare and its essential role 


Salient features are concerned with the critical analysis and 
Sal of the structure and function of social work processes and 
dynamics in contemporary life and culture. Important changes 
Islation and latest advancements in social work practices are 
"ted in the light of incorporated insights and knowledges from 
OUS disciplines of political science, sociology, economics, 


256 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


medicine, psychology, anthropology, biology, history, education, 
and philosophy, in understanding human needs, human motiva- 
tions, and the dynamies of human relationships. 

Illuminating the original text, this second edition brings the topie 
up to date and focuses attention on the new aspects and principles 
of modernistie organized social welfare in a period of scientifie en- 
lightenment and of great technieal achievement, Its recognition and 
willingness to deal scientifieally with the concomitant effects of such 
changes are encouraging signs in a developing profession. Main areas 
under discussion are titled, “The Historical Development of Social 
Welfare,” “Social Work Processes,” “Social Welfare Programs and 
Practices,” with revised bibliographies and reading suggestions 
pointing out trends included with the most essential chapters. The 
topic, “Social Welfare Programs and Practices,” covers such sub- 
„jects as community services, income security, family social services, 
child welfare, medical and psychiatric social work, delinquency, 
leisure-time activities, services for special needs, international social, 
welfare, and professional aspects of social work. These subjects are 
critically treated and analyzed in their ability to cope with the 
“social problems of our present industrial society. 

Adoption of this book as a standard text by colleges and graduate 
schools is highly recommended. This work is invaluable not only to 
the general reader, but also to the students, educators, practitioners, 
and all citizens concerned with the conservation of human resources 
to which publie and voluntary programs are oriented. 

Maser E. Hayes 4 
University of Southern Califorma 


| 


WCATIONAL ond MEASUREMENT 


Editor: G. Frederic Kuder, Duke University 
Associate Editor: John A. Hornaday, Greensboro College 
Assistant Editor: Joan F. Hornaday 
Business Manager: Geraldine R. Thomas 


BOARD OF COOPERATING EDITORS 


| lovis D. Conen M. W. RICHARDSON 
Duke University Richardson, Bellows, Henry and Co. 
, Нако LD A. EDGERTON Joun H. RoHRER 
Richardson, Bellows, Henry and Co. Georgetown University 
D. ENGELHART Eos oj Medicine 
Chicago City Junior Colleges P. J. RULON үй 
ЮВ. Greene Harvard University 
к җе 
Chrysler Corporation Davi SEGEL | 
J.P. G Indiana University 
Un UILFORD C. L. SHARTLE 
niversit ` : “М. 
ERL V of Southern California Ohio State University 
"m paver Н. C. TAYLOR 
Wu. University of Iowa The W. E. Upjohn Institute for — ... 
‚ *REDERIC M, Lonp = Community Research 
Educational Testing Service THELMA G. THURSTONE 
Arom Lonny University of North Carolina 
Molier Reed Army Instituta HERBERT A. Toors 
Sa esearch Ohio State University 
fie Манок ' E.Q. WILLIAMSON 
ы Testing Service University of Minnesota 
ES. MicnagL Ben D. Woop 
versity of Southern California Columbia University 


Donoruy ADKINS Woop 
University of North Carolina 


lu 
“ME TWENTY-TWO, NUMBER TWO, SUMMER, 1962 


Li 
ч 


ERRATA 


The following corrections should be made in “Predictive Validities 
in an Institute of Technology,” by E. O. Swanson and R. F. Berdie, 
which appeared in the Winter 1961 issue, Volume X XI, pages 1001- | 
1008: 


1. Page 1005, large paragraph in middle, 14th line, the statement 
in parentheses should read “(correlation coefficients of .41 and 
-50, respectively) .” 


2. Page 1005, same paragraph, last sentence should read “The 
CEEB-M score shows the second highest correlation with first | 
quarter GPA, .50.” 


3. Page 1006, Table 1, entries for the column headed CEEB-Math 
should read (from top down): .26, .50, .43, .62, .37, .62, .38, 42, 
:56, and .58. 


4. Page 1006, also Table 1, entries for the row headed CEEB- 
Math should read (for the last three column entries): .50, 
630.2, and 78.91. 


5. Page 1006, also Table 1, the coefficient for line four of the | 
Selected Multiple Correlations should be .57. 


6. Page 1007, Table 2, the last correlation coefficient of this table, 
Math Grade vs. CEEB-Math, should be .467. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vou. XXII, No. 2, 1962 


ESTIMATING NORMS BY ITEM-SAMPLING: 


FREDERIC M. LORD 
Educational Testing Service 


TRULY representative national norms are seldom obtained for any 
published test. The most serious obstacle is the fact that not every 
School is willing, at the request of some test publisher, to suspend 
its accustomed activities and require its students to spend a class 
Petiod or more taking tests. As a result, published “national” norms 
Wually do not represent the nation’s schools, but, at best, only those 
Willing at a particular time to cooperate with a particular test 
Publisher, 

The problem of getting each school's cooperation would be less 
‘rious if only a few moments of each student's time were required, 
"hr than an entire class period. This raises the question of 
Wether the performance of a large group оп a long test can be 
“timated by administering only a few items to each student. 

Ї such methods of estimating group performance were possible, 

“у would be helpful not only for norming a single test, but even 
More for norming a number of tests simultaneously. They might 
tho le helpful in any research study of group performance where 

8 testing time or the scoring costs would otherwise be prohibitive. 

Such cases, it will not always be a question of substituting a 
EM estimate obtained by item-sampling for a wholly satis- 
io ТУ determination of the scores desired. When the administra- 

ñ of an entire test to an entire population of examinees is im- 
ее, the question is whether estimates obtained by item 
E 
mas Work was supported by contract Nonr-2752(00) between the Office of 


Part fo earch and Educational Testing Service. Reproduction poroa or in 
any purpose of the United States Government is permitted. 


200 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


sampling are better or worse than estimates obtainable by other 
methods, such as sampling of schools or of examinees, 

The present study is concerned with just such a comparison of 
estimates. The effectiveness of item-sampling for estimating a 
group’s mean score was investigated in an earlier study (Johnson 
& Lord, 1958). Now the attempt is made to estimate the entire 
frequency distribution of scores, rather than just the group mean. 

The plan of the study is described in section 1. The rationale for 
estimating the norms distribution by item-sampling is outlined in 
section 2, The results obtained are given in section 3, which is fol- 
lowed by a brief discussion. 


1. Plan of Study 


Suppose we have a 70-item test and a norms population of 1,000 
individuals. We need an estimate of the score distribution for the 
entire norms population. Unfortunately, either available testing 
time or available scoring facilities are so limited that we can ad- 
minister the 70-item test to only 100 examinees. Is it possible that 
we can obtain a better estimate of the norms distribution by testing 
each of the 1,000 individuals in the norms population with just 7 
items than we can by giving the actual 70-item test to a random 
sample of 100 individuals? 

Tn the present study, an answer to this question was sought from 
the vocabulary-test answer sheets of a nationwide sample of 1,000 
college seniors. The recorded responses for the first 70 items only 
were used. Since everyone finished these items in the testing time 
available, the problem of “not-reached” items does not enter the 
Picture. 

The “population norms” distribution to be estimated is taken to 
be the distribution of the test scores of all 1 
70-item test. Each score referred to here is simply the number of 


right answers marked by the examinee, as are all other test scores 
involved in the present study, 


The population was subdivid 
groups of 100 examinees ea 
70-item test score for each 


ed at random into 10 nonoverlapping 
ch. The frequency distribution of the 
group is here considered as a conven- 


? The writer is indebted to 


з WI Robert L, Ebel f gesti stud; 
and pointing out its importance, M Ре rid oa 


1 


FREDERIC M. LORD 261 


tional “examinee-sample” estimate of the population norms distri- 
bution. 

An index of the accuracy of each of the 10 examinee-sample esti- 
mates was obtained by computing the following measure of dis- 
crepancy: 


р = CEO (1) 


where ¢ is the frequency in the norms distribution and ji is 10 
times the frequency in the 100-examinee sample. For computing D, 
adjacent test scores were grouped into score intervals where neces- 
sary in order that each ф should be at least 10. The summation in 
(1) is over the 48 score intervals thus obtained. 

The discrepancy measure D is obviously related to chi-square. 
Since it cannot be used to make a significance test, however, it will 
be better here to think of it simply as a measure of the "distance" 
or discrepancy between two frequency distributions. (There is 
clearly no need for a significance test here since it is known in ad- 
vance that each group really is a random sample drawn from the 
horms population.) 

A single “item-sample” estimate of the norms distribution was 
next obtained, as described in the following section. Another value 
` of D was obtained by substituting the item-sample estimate of ф for 
$1 in equation (1). The relative merits of the examinee-sample and 
the item-sample estimates are judged from the various values of D. 


2. The Item-Sample Estimate 


Ten samples of 7 items each were drawn with replacement, inde- 
pendently, and at random from the 70 items involved in the present 
study. Each sample of items was treated as а 7-item test. The ten 
T-item tests were assigned at random, one to each of the 10 groups 
of 100 examinees. The answer sheet of each examinee in each group 
Was then scored on the appropriate 7-item test. 3 

(In retrospect, the foregoing item-sampling procedure is seen to 
have been unnecessarily inefficient. Items were sampled with re- 
Placement after each sampling for the reason that such sampling is 
effectively the same as sampling from an infinite pool of items, and 
the available formulas in Lord (1960) for utilizing the resulting 
data are discussed in terms of sampling from an infinite pool. It 


262 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


would have been better to sample without replacement, thus di- 
viding the 70 items at random into 10 nonoverlapping 7-item tests. 
Hooke’s (1956) basic derivations show that the same formulas would 
be valid for such sampling, without replacement. This would ob- 
viously be preferable since, in the actual sampling with replacement, 
18 of the 70 items were drawn more than once and 25 were never 
drawn at all.) 

The following statistics were computed for each group from the 
data on the 7-item scores: 


1 100 

Фа = тор 25 = ®, (2) 
7 

gos = 2 Ti, (3) 
11%, 

Ju = 100 »» Gay (4) 

Gos = gzs; (5) 


where 7, is the 7-item score of examinee а and 7; is the proportion 
of examinees in the group answering item ї correctly. 

As shown by Lord (1960, Table 2), corresponding statisties for a 
randomly parallel 70-item test can be estimated from these values 
for each group by the formulas 


)6( ,»100 = ل 
фа = 115ga — 1050, (7)‏ 
б» = 115923 - 1050. (8)‏ 


'Тһе mean and variance of the 70-item test are then estimated for 
each group by 


№ = б, (9) 


100 
0 = pg (des = fn). (10) 


The values obtained for each of the 10 groups from (9) and (10), 
respectively, were averaged together to obtain, finally, one estimate 
of the mean and one estimate of the variance of the 70-item test 


и in the 1000-case norms population. These two “item-sam- 
pling" estimates will be denoted by M and P. 


FREDERIC M. LORD 263 


These estimates should be good ones since only a bare minimum 
of assumptions were made in obtaining them. The main purpose, 
however, is to obtain an estimate of the entire norms distribution, 
not merely of its mean and variance. An effective procedure would 
probably be to use the available formulas in Lord (1960) to estimate 
higher-order moments of the norms distribution, and then to esti- 
mate the shape of this distribution by fitting a Pearson Type I 
curve, say, to the estimated moments. 

The use of higher-order moments was avoided, however, as too 
laborious computationally. Instead, the relatively simple method 
was adopted of fitting a negative hypergeometric distribution [to 
be denoted by H (x)] to the three parameters M, V, and n (= 70), 
the number of items. (This distribution has some theoretical basis 
and, more important, has been found (Keats & Lord, in press) 
to provide a reasonably good fit for quite a variety of test-score 
distributions when the test score is the number of right answers.) 
The fitted distribution is, for present purposes, the item-sample 
estimate of the norms distribution. 

In practical norming work, the 7-item tests would have been 
administered by themselves, not as part of a longer test. Any practi- 
cal application of the item-sampling method thus involves the 
further assumption that the examinees’ performance on the items is 
not too greatly affected by the context in which they are ad- 
ministered. Statistical theory can hardly be expected to dispense 
with the need for this assumption. Presumably the assumption will 
be tolerable in some practical situations but not in others; at must 
therefore be justified empirically for the actual practical situation 
in which the theory is to be applied. 


8. Results 


Table 1 shows comparative statistics for (a) the 10 examinee- 
sample estimates, arranged in order according to their value of D, 
(b) the item-sample estimate of the norms distribution, (c) the 
norms population. It is seen that the item-sample estimate of the 


mean is closer to the norms-population mean than are 7 of the 10 


examinee- › esi . The item-sample estimate of the vari- 
nee-sample estimates. than 5 of the 10 


ance is closer to the norms-population variance 
*xaminee-sample estimates. 


The negative hypergeometric distribution with a mean and 


200 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 
Comparison of Estimates of Norms Distribution 


Number D for 
of items Number raw Dfor D' for 
per (N) of Vari-  distri- fitted fitted 
Data examinee examinees Mean  ance* bution H(z) H(z) 
Ezaminee-Sample 
Group 5 70 100 429 2145 49.3 10.6 10.0 
9 70 100 443 1991 444 6.6 5.5 
0 70 100 46.5 1880 41.0 7.7 3.8 
6 70 100 441 2126 39.2 8.4 6.6 
2 70 100 481 1941 38.1 19.2 5.3 
1 70 100 463 1748 37.4 5.9 4.2 
7 70 100 45.1 150.5 36.7 7.6 7.6 
4 70 100 46.1 1550 344 6.3 6.1 
8 70 100 458 1537 33.8 6.5 6.4 
3 70 100 463 1705 330 5.72 4.5 
Item-Sample 7 1000 46.1 162.1 — 5.67 5.1 


Norms Population 70 1000 45.5 1817 = z= NT: 


N 
* (e – HN — 1). 
1 


variance equal to the item-sample estimates is much closer to the 
norms-population distribution than are any of the 70-item test- 
score distributions obtained by sampling examinees—the discrep- 
aney index for the former is D = 5.7, whereas those for the latter 
range from 33.0 to 49.3, as shown in the antepenultimate column of 
Table 1. This is hardly a fair comparison, however, since most of 
the diserepancies counted against the examinee-sample distribu- 
tions are due to local irregularities, arising from sampling fluctua- 
tions. These could be eliminated by replacing the jagged actual 
distributions by smooth, fitted distributions. 

A fairer comparison is provided in the next-to-last column of the 
table. Here each examinee-sample distribution has been replaced by 
a negative hypergeometric with the same mean and variance, and a 
D has been computed between this H (x) and the actual norms 
distribution. This comparison still shows the item-sample estimate 
to be closer to the norms population than any of the H(z) in any of 
the 10 groups of 100 examinees, 

A closer study of the terms summed to obtain D showed them all 
to be reasonably small, except, in several groups, the term con- 
tributed by test scores of 67 and above. This indicates that H (a) 


[4 


A 


FREDERIC M. LORD 205 


does not adequately represent certain groups at the very top of the 
score distribution. The discrepancy index D', shown in the last 
column of the table, was obtained by omitting from D the unduly 
large term for scores = 67, so as to provide a still fairer comparison. 
This shows that when discrepancies near the test ceiling are ex- 
cluded, the item-sample estimate is superior to the examinee-sample 
estimate obtained from 7 out of the 10 samples. 

Each of the values in the last two columns of Table 1 represents 
discrepancies between an estimated norms distribution, in the form 
of an H (x), and the actual, unsmoothed norms. The question of 
course arises: How well can the actual norms distribution be ap- 
proximated by any H(x)? To answer this question, an H (х) was 
fitted to the actual norms distribution. The resulting index of dis- 
crepancy is D = 5.4 or D' = 2.4. The value of D corresponds 
roughly to a chi-square of 54.4 with 45 degrees of freedom, which 
would be at about the 15 per cent significance level. Examination 
of Figure 1 and a breakdown of the chi-square into portions with 
single degrees of freedom shows that H(z) provides a reasonably 
good fit to the actual norms distribution, except, perhaps, at the 
very top. 


4. Discussion 


The results of this study suggest that, if a choice must be made, 
it may be better in some cases to estimate norms by sampling items 
rather than by sampling examinees. Because of the labor involved, 
only one item-sample estimate was made in the present study ; hence 
there is much need to test out this tentative conclusion on other sets 
of data, 

On the other hand, there is good reason to hope for still better 
results than those obtained here, since in another study the items 
should be sampled without replacement, so that all items are repre- 
sented in the samples rather than only two-thirds of them 88 
happened here. Further improvement could be obtained by drawing 
More item-samples so that each item would appear in several differ- 
ent short tests, 

The following limitations and comments on the method should’ be 
Noted: 


1. The item-sample method assumes that performance on ап e 
does not depend on the context in which it occurs. This is & 


266 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


"Т Jo вәүвшш8{ Sole A pue uonnquas«q Aouenbarg dnoio suon '[ әт 


9109S 1S9[ 
OS Ov oe O2 Ol 


mm Buljdwos шәЏ 
оосо G dnog 
o—o— С ОПО 

--- e dnos 

e—^. —— SWIN 


:9poo 


Ns 


Aduanbe.4 


FREDERIC M. LORD 207 


serious assumption that must be evaluated carefully in each 
new practical situation. 

2. In particular, the foregoing assumption means that the method 
cannot be satisfactory for speeded tests. 

3. Granted the assumption made in 1 above, the item-sample 
method gives an unbiased estimate of the norms mean and 
variance. As described here, the success of the method then 
depends on the possibility of estimating the entire norms dis- 
tribution from this estimated mean and variance. Experience 
(backed up by a little theory) indicates that the negative 
hypergeometric distribution is frequently adequate for this 
purpose for tests that are scored number right. There is as yet 
no experience with tests scored by other methods. 


Higher moments of the norms distribution could be estimated 
effectively if desired; however, this would be computationally 
laborious and expensive. 

An obvious question asks for the optimum number of items and of 
examinees to use for a particular norming job. Further experience 
will be needed before answering this question. To do so will require 
evaluating the various possibilities of trading items for people ша 
wide variety of situations. 


REFERENCES 


Hooke, Robert, “Symmetric Functions of a Two-Way Array.” 
Annals of Малон я Statistics, XXVII (1956), 55-79. P 

Johnson, M. Clemans and Lord, Frederic M. “An Empirical Study 
of the Stability of a Group Mean in Relation to the Distribution 
Y Test Items Among Pupils.” осы AND PSYCHOLOGICAL 

EASUREME II (1958 $ К AA. 

Keats, John ne pero M. “A Theoretical Derivation of 
the Distribution of Mental Test Scores.” Psychometrika, in pre 

Kendall, Maurice G. and Stuart, Alan. Distribution Theory. i is f 
ume I of The Advanced Theory of Statistics.) New York: 
Hafner, 1958. { 

Гога, Frederis M. “Use of True-Score Theory To Predict Momen 
of Univariate and Bivariate Observed Score Distributions.” Psy 


chometrika, XXV (1960), 325-342. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 2, 1962 


THE DETERMINANTS OF READING COMPREHENSION 


PHILIP E. VERNON 
Institute of Education, University of London 


CONTEMPORARY tests of educational abilities are far more sophis- 
ticated instruments than the new-type attainment tests of the 
1920’s and '30's. At the high school and college levels they avoid 
questions relating to straightforward factual knowledge and skills, 
since these have been found to stimulate undesirable methods of 
study. Instead, as Brownell (1946), Dressel (1954) and Bloom 
(1956) point out, their aim is to elicit understanding of principles, 
ability to apply knowledge, critical thinking, judgment, and other 
educationally valuable qualities. 

The present writer, however, brought up in an European educa- 
tional system which still relies primarily on the essay examination 
for eliciting these higher educational qualities from its students, is 
inevitably struck by certain weaknesses in such tests, Whatever the 
subject matter—English, social studies or natural sciences—they 
tend to take the form of complex reading comprehension tests, and 
they therefore appear to depend partly on the students’ facility in 
Understanding the instructions and coping with multiple-choice 
items. While admitting that fluency and legibility in essay writing 
unduly affects the students’ performances in any European examina- 
tions, he wonders whether new-type test sophistication does not 
equally distort assessments of abilities in the American setting 
(Vernon, 1958). These suspicions are confirmed by a study of the 
experimental literature and of test manuals, from which it appears 
that the correlations between tests aimed at different mental func- 
tions or different school subjects are extremely high. For example, 
the mean intercorrelation for five of the Iowa Tests of Educational 


?/0 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Development among 9th and 12th grade students is quoted as .716, 
where the mean reliability is .905; yet these tests are supposed to 
measure such different abilities as: 1. Basic Social Concepts, 5. 
Reading in Social Studies, 6. Reading in Natural Science, 7. Inter- 
preting Literary Materials, and 8. Vocabulary. A battery of this 
kind is highly inefficient for differential predictive purposes, and it 
is noteworthy that, in Horst’s (1959) extensive studies of differential 
prediction, the tests which generally contribute most to the regres- 
sion equations are the more factual ones such as Scientific Vocabu- 
lary, Mathematical Concepts, together with specific high school 
grades and Strong or Kuder interests. Though several investigations 
have claimed to show good differentiation among tests in different 
subjects (e.g., Shores, 1943), far too often there is no indication of 
the extent to which score differences may not be due merely to the 
imperfect reliability of the contrasted tests, 

Evidence regarding the measurement of different levels, or types, 
of ability is somewhat contradictory. Davis (1944) attempted to 
measure nine skills hypothesized in reading, but Thurstone (1946) 
claimed that all the intercorrelations could be attributed to a single 
Vocabulary + comprehension factor. Derrick (1953) suggests that 


two aspects, and many 


By contrast, Howard (1943) extracted five subtests of items 
representing different leve], 


Uo» 


PHILIP E. VERNON 271 


lated far below the reliability coeffieients with ordinary course 
marks. E 

The writer would contend that a major reason for high correla- 
tions between tests aimed at somewhat different abilities is their 
common item-form and their dependence on the students’ sophisti- 
cation. In the same way essay examinations in different subjects, 
or marked for different qualities, are apt to intercorrelate highly, 
but to show lower correlations with new-type tests in the same 
subject. Both test-content or function and test-form or method are 
highly complex, but a possible analysis (partly derived from Thorn- 
dike, 1949) is in Table 1. 

Educationists and psychometrists, in their concern over Content 
and Error, have paid little attention to the Method category; and 
even if its influence on test or examination performance is small, 


TABLE 1 
Sources of Variance in Educational Test Scores 
eee” 
CONTENT : 


Level of integration—interpretation, understanding vs. knowledge of concepts 
and operations GT. 

Different, functions at a given level, e, evaluation, application to new prob- 
lems, retention, ete. 

Technical bias—scientific, historical, literary, ete. 


METHOD 


Factors attributable to the medium or material of the test (verbal, symbolic, 
visual, performance, ete.) irrelevant to its purpose Р 

Type of presentation—oral, visual, verbal; questions asked simultaneously or 
subsequently А 

Difficulty level and speed conditions, which may modify factor content student’ 

Response-type—multiple-choice, matching, etc.; short-answer (in students 
own words), oral, essay (restricted time), dissertation diti 

Attitudes and motivations induced by the tester and testing con tions ere 

Sophistication—knack in coping with кр of item, OEE 
pected clues, guessing and using time wisely, ete. , : ffi 

Sets arising from the student's understanding (or misunderstanding) pid ot 
structions or transferred from previous tests; adaptations In the 
doing the test 

Other response seta, e.g, tendency to guess, to ch 

Speed and accuracy in recording answers on answer 
in written answers 


оозе extreme responses, etc. 
sheets; speed and legibility 


ERROR 


Sampling variance between the passages or questions set 
Fluctuations in moods, attention, fatigue and health of students 
Scoring errors in objective tests 


Variance between markers in essay or other non-objective assessments 


272 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


we should at least attempt to investigate its components more fully. 
"Though correlational factor analysis can help, there is always the 
danger of interpreting the nature of a factor in terms of content, 
which really derives more from common method components. (For 
example, the distinction often drawn by factorists between I—in- 
ductive, and D—deductive, reasoning may arise more because most 
I-tests are based on non-verbal, numerical or letter material, most 
D-tests on verbal material, than because they involve different 
kinds of thinking.) A Fisherian type of factorial design would 
therefore be more appropriate for studying content and method 
variances simultaneously. The same conclusion was reached by 
Campbell and Fiske (1959) in the field of personality: “Method or 
apparatus factors make very large contributions to psychological 
measurements”; and they advocate the Multitrait-Multimethod 
Matrix to overcome this. 


Design of Investigation 


An investigation was carried out, among 108 male students in a 
British Training College for Teachers, The time available for test- 
ing, and the students’ patience, permitted the exploration of only a 
few of the factors listed in Table 1, However, the value of the study 
was greatly enhanced through its repetition with a group of 75 
American college students, for which Dr. W. Coffmann of the Edu- 
cational Testing Service and Professor D. L, Cook of Purdue Uni- 
versity were responsible, 

Several tests were constructed, or adapted, in two parallel forms, 
to cover various content and method differences in the area of 


session or occasion, to be controlled. The order of application was 


arranged to provide variety. Numbers of items and time limits are 
listed in Table 2. 


1 The views expressed in this article mu 
collaborators, Н st not, of course, be 


loan of certain test, materials and for payments made to the 


PHILIP E. VERNON 273 


Tests. 1. Vocabulary A and B. Definitions to be written by the 
students in their own words, either synonyms or short phrases. 

2. Vocabulary C and D. Multiple-choice. 

3. Sentence Completion A and B. Filling gaps in sentences, multi- 
ple-choice. 

4, Reading Comprehension A and B. Answering questions on 
200-400 word passages, in own words. 

5. Reading Comprehension C and D. Ditto, multiple-choice. 

6. Reading Comprehension E and F. In each test two passages 
of 400-500 words were presented, to be read in 34 minutes without 
seeing the questions; thereafter questions were given to be answered 
without consulting the passages, one lot in own words, the other 
multiple-choice. $ 

The questions in Tests 4-6 were further cross-classified, half of 
them being based primarily on factual information given in the 
paragraphs, half requiring more inference or judgment. Thus each 
pair of tests could be alternatively scored for factual or inferential 
abilities. 

7. Reading Comprehension G and H. This was a published, multi- 
ple-choice test which aims particularly to measure inference from 
social studies materials. 

8. The British group only was given a multiple-choice test of com- 
prehension of tables and graphs, referring mainly to medical and 
Psychological statistics. 

9-11. The American group had previously taken the Nelson- 
Denny Reading Test (Vocabulary and Comprehension) and an 
Entry English Test, two-thirds of whose items dealt with sentence- 
structure, punctuation and spelling, the remaining third with vo- 
eabulary and reading. The numbers of available subjects dropped 
to 67 and 62, respectively. 

12. An external criterion of intellectual competence, in the form 
of grade-point averages, was available for the American students. 
British students wrote essays on educational themes during the year 
and took an essay examination in education two months later. As 
their papers were marked by various lecturers, the marks were ex- 
Pected to be weak in reliability. Fourteen months later they took a 
More thorough, and uniformly marked, examination; but its results 
Were published only in the form of very coarse, and therefore also 
Unreliable, grades. 


— s c QN 


274 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The scoring of the “own-word” or “ereative-response” tests, Nos. 
1, 3 and half of 6, was done by the writer alone. However, accept- 
able and unacceptable answers were listed, and it is believed that 
no greater subjectivity was involved than in scoring, say, Terman- 
Merrill vocabulary. 

The main hypotheses which can be tested by the above battery 
are listed in an order following that of Table 1: 

I. Tests at different levels (Reading A-H versus Vocabulary 
A-D) involve partially distinguishable abilities. Sentence comple- 
tion should be intermediate. 

IL. Tests of different functions, such as factual versus inferential, 
will show some distinction, though this may be small. For the classi- 
fication of items was made by the writer alone, and Derrick (1953) 
has shown that inter-judge consistency is not high. 

III. The Table and Graph Reading test involves verbal and nu- 
merical questions and responses; but as these refer to visual ma- 
terial, it should measure a somewhat different ability from the 
verbal tests. 

IV. Type of presentation: Reading Tests E and F, in which 
students cannot search for the answers in the paragraphs, will meas- 
ure a different ability from conventional reading tests, and will show 
a higher validity against an examination criterion. 

V. Form of response: Tests responded to in own words will cor- 
relate more highly among themselves, also multiple-choice tests 
among themselves, than own-word with multiple-choice. 

VI. American students will do relatively better at reading tests, 
to which they are far more accustomed, than British students, and 
their responses to multiple-choice tests will be more structured and 
consistent. 


VII. British students, but not American, will improve with prac- 
tice at these same tests. 


VIII. There will be a small but appreciable sessional effect, that 
is, higher correlations within than between sessions. 


Results 


Both groups of students were mostly in their early twenties. The 
British were all male, but the American included 32 men and 43 
women. No appreciable difference between the sexes could be ob- 


served, except in the Entry English test; thus no distinction was 
made in analyzing the results. 


than might be expected in view 


No. of 
Form Items 


Re- Both 


_ Vocabular AI 

3 er! 40 16 14 
Vocabula т CD 
lence ча me 56 16 12 


N-D Vocab 
ulary me 100 
-D Reading me 36 


try English me 225 


pletion AB m 36 3200030 
ing AB cr 33 24 30 
ing CD m 34 90299 

ing EF (er 16 
34 30 

ii) me 15 
Reading GH : Ја 34 27 
ables & Graphs me 50 40 


assis 


of in Time (mins) M 


ee 
Test sponse Parts Amer. Brit. 


sistent gains in score from the first to 
reading tests, but not on vocabulary. On 


TABLE 2 


Numbers of Items in Combined Tests, Total Times, Means, and 
Score Increases from First to Second Sessions 


Amer. 


t2 
- о 


مرس 
ʻe‏ 


Bo 68 


цыю! оо Фото u- 


$5 


SE 


PHILIP E. VERNON 


Table 2 gives the main information about each test in each group. 
It will be seen that American students were generally allowed 
longer times since it was thought that ihey were somewhat less 
highly selected than the British. Clearly, direct comparisons of 
means are not legitimate. Nevertheless it is obvious that the British 
are at least le superior on both vocabulary tests, despite their 
shorter time, and yet they obtain mueh the same scores on reading 
tests. It might conceivably be argued that British students receive 
much less training in reading at school, and thus perform less well 
of their good vocabulary. But it 
seems much more likely that American students do relatively well 
at reading tests because of their fami 
their expertise in coping with this item- 
have had virtually no experience of new-type 
bly some eight years earlier when they entere 
is confirmed by the fact that the British 


275 


iliarity with such tests and 
form. The British would 
tests (except possi- 
d high school). This 


show larger and more con- 


Standard 


the second occasion on all 
the combined tests the 


ean Scores Deviations Score Gains 
аа 


——— 


Brit. Amer. Brit. Amer. Brit. 


bebo oF 
orc = 
aonb ÒN 


BB 
ll 1222 


5.72 6.66 
7.69 8.12 
6.40 5.20 
4.16 3.88 
4.94 4.15 
2.88 2.39 
2.67 2.51 
4.01 5.00 

— 6.09 
14.15 — 
10.72 — 


—0.17 +0.49 
—0.05 —0.31 


+0.02. +0.07 
+0.14 +0.60 
+0.33 +0.28 
—0.08 +0.31 


+0.13 +1.08 


276 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
British rise of .30 sigma units is highly significant; the American 
rise only slightly exceeds its standard error. Hypothesis VII, and 
the first half of Hypothesis VI, are therefore confirmed. 

The low scores, particularly among American students, on crea- 
tive-response Vocabulary are noticeable. Clearly the test was made 
much too difficult. However, the four vocabulary lists, A, B, C, and 
D, were drawn from a common pool of words, and they were known 
from previous item-analysis data to be of closely similar difficulty. 
Thus the differences between the means for Vocabulary CD and 
Vocabulary AB provide an indication of how much easier it is to 
recognize than to formulate a correct concept. Similarly Reading 
CD is easier than Reading AB, though not so markedly. 

Sessional Variance. All 14 part-scores from the seven main tests 
were intercorrelated in each group by the tetrachoric method, tak- 
ing splits close to the tertiles and averaging the two coefficients so 
obtained. Changes from one session to the other in students’ motiva- 
tion, or general adaptation to the testing situation, will be shown 
by contrasting the correlations of different tests at the same and at 
different sessions. 


TABLE 3 
Mean Correlations Showing the Sessional Effect 


Amer. Brit. 

noose 2 оа. Зла Зна 
42 correlations within sessions ‚584 427 
42 correlations between sessions .487 .408 


ES a 0 0 

i The difference is appreciable in both groups, the average correla- 
tion being 5 to 10 per cent higher within the same occasion than 
between occasions. Doubtless this difference would increase with a 
longer time interval. Much the same effect could have been shown 
by contrasting the Kuder-Richardson and the repeat reliabilities of 
the seven tests. For the British group these averaged .697 and .626, 
respectively. 

There is no obvious technique for testing the significance of differ- 
ences between mean tetrachories. However, if the distributions of 
coefficients themselves are compared, the American within-session 
mean exceeds the between-session mean at the .001 level. The 


British difference is not significant, Over-all, it wo 
- uld that 
Hypothesis VIII is confirmed. "2 


PHILIP E. VERNON 277 


Factual versus Inferential Questions. When the scores on Read- 
ing Tests AB, CD, and EF for factual and inferential questions are 
separated, their tetrachoric intercorrelations yield the means shown 
in Table 4. Also listed are the correlations of both types of score 
with Reading Test GH, most of whose items are strongly inferential. 


TABLE 4 
Mean Correlations Showing Effects of Function Tested 


Amer. Brit. 
I ae ee 
6 correlations within factual or within inferential „585 .385 
6 correlations between factual and inferential .487 .397 


scores on different tests 
I EE 


3 inferential scores with Reading GH .563 E 
3 factual scores with Reading GH .482 +313 


COo o OO OoOo ON а 

The American, but not the British, results confirm Hypothesis II. 
However, in both groups the correlations with Reading GH are 
confirmatory. Though there is no satisfactory test of the signifi- 
cance of differences between mean correlations, it would appear 
that functions in reading can be differentiated rather more readily 
than Derrick believed, and they may contribute some 10 per cent 
of variance. But it should be noted that, in this investigation (un- 
like Davis’ and Derrick’s), the two types of questions referred to 
different passages; and it might be argued plausibly that the differ- 
ence is not so much one of psychological function as of technical 
bias (cf. Table 1). Thus the passages yielding inferential questions 
were all of a philosophical, psychological, or aesthetic nature, 
Whereas the passages yielding factual ones were more descriptive— 
scientific, geographical, or historical. : 

Retentive Reading versus Immediate Comprehension. The ex- 
Perimental design was inadequate for covering Hypothesis IV. 
Nevertheless Table 5 provides a little evidence, and more will 
emerge in Table 10. Here, all tetrachorie correlations are between 
multiple-choice and own-word scores, that is, item-form is held 
constant, 

The correlations within Immediate Comprehension tests (AB, 


CD) are no higher than between these and the Retentive tests (EF). 


However, the single correlation within the two halves of the Reten- 


278 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 5 


Correlations Within and Between Retentive and Immediate 
Comprehension Tests 


Amer. Brit. 
4 correlations within Immediate Comprehension Tests .485 .928 
8 correlations between Immediate and Retentive Tests .535 .359 
1 correlation within Retentive Tests .60 E 


tive test is higher in both groups, indicating that type of presenta- 
tion may have some influence. 

Multiple-choice versus Creative (i Own-Word) Responses. The 
scores on the two forms of each test were now combined, and the 
product-moment correlations between all tests are shown in Table 
6. The coefficients for American students are above the diagonal. 
Kuder-Richardson Formula 20 reliability coefficients are shown 


TABLE 6 


Intercorrelations of All Tests (American Group Above, British Group Below the Diagonal), 
and Kuder-Richardson Reliabilities 


Voc. | Voc. | Sent. | Rdg.| Rdg.| EF | EF Rdg. 


N-D | Entry | Tab. , 
AB | CD |Comp.| AB | CD | er 


me | GH Gra. 

Vocabulary |.892 

oe "802 |-770 | .578 | 477 |483 | 303 | 457 | asa |. — 
Vocabula | 

“ср. |-844 |87 | -726 |.575 |.530 |.448 | 508 | 487 |. = 
Sent. Com- .826 

pletion |-702 |.687 | "Z2 |.656 |. вво |.587 |.7o7 |.711 |. —' 
Reading .626 

AB 


.586 |. 


PHILIP E. VERNON 279 


along the diagonal. Some of these are very low, particularly for 
Reading test EF, where the creative and multiple-choice scores were 
based on only 15 or 16 items. But the tests were intentionally kept 
short to avoid over-taxing the students’ patience, and there is no 
reason to suppose that longer tests, containing more carefully pre- 
tried items, would not reach acceptable levels of reliability. 

A comparison of all the multiple-choice and creative-response 
correlations does not appear to bear out Hypothesis V. For example, 
Vocabulary AB does not correlate more highly with Reading AB 
(both creative) than with Reading CD (multiple-choice). Table 7 
shows the mean coefficients. 


TABLE 7 
Mean Correlations Within and Between Multiple-Choice 
ee 

6 correlations within multiple-choice .595 .432 

3 correlations within creative .484 488 
Меап .558 ‚451 

11 correlations between multiple-choice and 

creative (omitting Vocab. AB with Vocab. CD) ‚562 ‚455 


A further examination was made of the residual correlations after 
removing the Content factors by factor analysis (see below). A 
majority of these tended to be positive for within-item-form co- 
efficients, negative for between-item-form, but not to a significant 
extent. If any variance is attributable to this difference in item- 
form, it can hardly amount to more than 1 or 2 per cent. It is un- 
fortunate that a wider difference, 48 between essay-form and multi- 
ple-choice form, could not readily be investigated. Nevertheless some 
further evidence will emerge in the factor analysis. 

Comprehension of Visual Items. The correlations of the Table and 
Graph Reading test with the seven verbal tests in the British group 
are clearly very low and irregular (Table 6), thus apparently con- 
firming Hypothesis III that test medium or material makes an 1m- 
portant, difference. However, it will be seen that the coefficients are 
highest for Reading tests AB and EF (multiple-choice), lowest for 


Reading GH and Vocabulary. Unfortunately EF and AB happened 


to include more pass with a natural science content, whereas 
оге pu Thus the obtained 


GH was based wholly on social science materials. 


280 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


differentiation probably arises as much, or more, from what was 
labeled Technical Bias in Table 1, as from test medium. 

Level of Integration: Vocabulary versus Comprehension. All 
eleven verbal tests were factor analyzed by the centroid method in 
the Ameriean group, and the eight tesis (omitting Tables and 
Graphs) in the British group. No significant variance remained in 
either matrix after two factors had been extracted, hence factoriza- 
tion was stopped and the calculations repeated until the guessed 
communalities coincided with the obtained values (median dis- 
crepancy .002, maximum .006). 

As shown in Table 8, the second factor clearly differentiates read- 
ing comprehension tests from vocabulary tests, in both groups. 
Hence rotation was carried out, passing the first axis through the 
center of gravity of the two vocabulary tests. The second factor 
then yielded loadings for the Comprehension tests approximately as 
great as their Vocabulary loadings (cf. Table 9). 

This confirms our first hypothesis and thus contradicts Thurstone’s 


TABLE 8 
Factorial Analysis of Verbal Tests 


Unrotated Factors Rotated Factors 


British 
Comp.| Spec. 
Vocabulary 
Nos .074| .054 
Sentence —.074| — .042 
Completion 
‘ .974| .051 
Sane .583| .019 
i .491| .235 
Pin acd 
ег) 
Reading EF .434| .073 
(mc) 
teading GH EU. E 
N-D Vocab- 892. 
шагу ; 
V-D Reading |. 
intry English |. 
'ercentage Variance TEM 


PHILIP E. VERNON 281 


TABLE 9 
Mean Variances of Five Complex Reading Tests 


Vocabulary Comprehension Total 
Factor Factor Communality 
ee oe 
American analysis 31.1 32.6 63.7 
British analysis 24.4 22.5 46.9 


© ___ _ o so o за 
verdict on Davis’ tests, Sentence Completion is intermediate be- 
tween the Vocabulary and the Comprehension tests, and Nelson- 
Denny Reading shows almost the same composition, presumably 
because its reading passages are simpler than those used in the 
writer's tests and it is more highly speeded. The Entry English test, 
consisting mainly of questions about words or separate phrases, ob- 
tains a near-zero Comprehension loading. 

In the American analysis almost all the specificities (Kuder- 
Richardson reliabilities minus communalities) are near zero, except 
those for the Nelson-Denny tests, which probably results from this 
test being given several months previously? The picture is very 
different in the British analysis. The communalities for all Reading 
Comprehension tests are lower, and the specificity is especially large 
in the multiple-choice tests, CD, EF (me), and GH. Clearly the 
responses of British students to these tests were less organized, more 
random, than those of Americans, whereas in the own-word or crea- 
tive tests—Reading AB and EF (er)—the two groups were con- 
fronted with equally unfamiliar forms of response and show much 
the same specificities. Had the experiment included ordinary written 
essays, we may infer that the American students would have shown 
greater specificities on these. A possible alternative explanation 
might be that the American students are more heterogeneous in 
reading capacity; but this is contradicted by Reading GH, where 
the British variance and reliability are higher. 

Ezternal Validation. The correlations with academic grades, 


shown in Table 10, are quite low among the British students, partly 
education marks, but partly 


because of the low reliability of their 


ici ificity. Possibly this 
? Vocabulary CD also shows & suspiciously large specificity i 
arises because the multiple-choice responses were chosen rdg reme 
many of them might represent English, rather than current, American, 


tations of word-meanings. 


282 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 10 
Correlations with Academic Grades 


American British 


Vocabulary AB +272 .163 
Vocabulary CD .941 .198 
Sentence Competion .421 .128 
Reading AB .281 .039 
Reading CD .296 .085 
Reading EF (er) .225 .276 
Reading EF (me) .434 .203 
Reading GH .295 .081 
Tables and Graphs — .218 
Nelson-Denny Vocabulary .129 — 

Nelson-Denny Reading -237 = 

Entry English .340 75 


also because these grades were based on essay-type examinations, 
whereas in the Ameriean group objective course examinations 
would have been generally applied. Several of the predietors were 
amalgamated and multiple correlations ealeulated, with the re- 
sults shown in Table 11. In the British group, Vocabulary, reten- 


TABLE 11 
Multiple Correlations 
h Beta 
American Coefficients 
Vocabulary ABCD .0259 
Reading ABCDGH —.1672 
Sentence Completion .2332 
Reading EF (me) .3218 
Entry English .1169 
Rm = .480 
url e LERNEN. I os 
British 
Vocabulary ABCD ‚2017 
Reading ABCDGH — .2682 
Sentence Completion — .0630 
Reading EF (cr) 2924 
Reading EF (me) .1241 
Tables and Graphs .1847 
Rm = .392 


он MM ED rr 


3 Correlations with the further 
not quoted here since they were 
Same pattern. 


criterion of achievement, one year later, are 
even lower. However, they followed just the 


PHILIP E. VERNON 283 


fe reading (particularly the creative form), and Tables and 
raphs all eontribute to the prediction. But apparently the ordi- 
iry reading tests, whether in multiple-choice or creative form, are 
0 unlike normal student reading activity that their beta-weight is 
trongly negative. Even in the American group these reading tests 
again the least valid and so yield a negative beta-weight. The 
objective form of retentive reading contributes most, followed by 
ntence Completion, the Entry English test, and Vocabulary. 
that all three creative-response tests are slightly less pre- 
ive than their multiple-choice parallels among students who 
аге more accustomed to the latter. 

While it would be possible to calculate the validities of Voeabu- 
y and Comprehension factor scores, it is simpler, and as effective, 
lo obtain the combined validities of Vocabulary AB + CD and 
“of Reading AB + CD + GH by Correlation of Sums, and then 
find the partial validity of the latter, holding the former con- 
tant. The results are given in Table 12. 


TABLE 12 
Validities of Combined Tests 


Vocabulary ABCD X Criterion 326 ted 
Reading ABCDGH X Criterion :333 617 
Vocabulary X Reading .616 . 

Reading X Criterion with dm —.02 


Vocabulary constant 


— We may deduce that the Comprehension factor can make a 
‘small but useful contribution to prediction of educational achieve- 

ment among American students, over and above Vocabulary, bo 
IL that it actually possesses а small negative validity among i 
| “Students. 


Conclusions 
All our specific hypotheses except V (and possibly ne are 
borne out. But the general trend of the results goes agains ХЕ 
initial argument, in that it suggests that what were Pour ii 
lent factors (particularly “Level” and “Technical Bias A d : 
much stronger influence than Method factors. Nevertheless, 


284 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


superior validity of reading tests which employ an unconventional 
method, that involving retention, suggests that Method variations 
would be worth further investigation. A similar test, called Di- 
rected Memory, was developed by Coffman and Papachristou 
(1955) and is now incorporated in the E.T.S. Law Schools Admis- 
sion test, where it shows good validity, over and above ordinary 
reading comprehension. 

Although the clearly demonstrated Comprehension factor, ortho- 
gonal to Vocabulary, would seem at first sight to reflect the differ- 
ence between comprehending connected arguments and compre- 
hending isolated concepts, it must be viewed with considerable sus- 
Picion in view of the British students’ weak and poorly organized 


Reading Comprehension test, though presumably the great ma- 
jority of American students have become 80 accustomed to it that, 
in their case, it contributes to the measurement of a valuable study 
skill. Others who are less sophisticated cannot so readily translate 
their understandings of the passages into terms of its conventions; 
hence for them the Comprehension component (as distinct from 
the Vocabulary component) has no predictive validity whatever. 

More generally it would follow that, in the evaluation of com- 
Plex skills and understandings, it is desirable to employ techniques 
which will resemble as closely as possible the ways in which these 


knowledge retained by the viewer or listener 
criterion of the total impact of these media 
behavior, 


Provides an adequate 
on his thinking and 


PHILIP E. VERNON 285 


Summary 


1. Assessments of the understanding of complex concepts, whether 
| by objective tests, written essays, oral or other methods, are affected 
not only by the level and type of concepts tested, but also by many 
| factors arising from the method of testing, and the subject’s facility 
in handling it. Certain weaknesses in current new-type achievement 
‘tests, such as their poor differential predictive capacity, may result 
from neglect of such method components. 
2. An investigation was designed to elicit the relative variance of 
certain test-content, method, and error components, and was car- 
ried out among 108 British and 75 American college students. The 
former group was more highly selected and was much superior in 
_Yocabulary tests; but its lack of superiority in reading comprehen- 
sion tests, and its significant improvement with practice, illustrate 
the importance of facility or sophistication at such tests. 
8. Parallel forms of seven specially constructed tests of vocabu- 
lary and reading were applied in two sessions. A sessional or “Ое- 
casion” effect was demonstrated by the higher correlations among 
- tests within than between sessions, particularly in the American 
group. 
_ 4. Three reading tests included passages and questions, some of 
- Which called for relatively factual, others for more inferential, 
_ comprehension. Higher correlations were obtained within than be- 
_ tween these types of comprehension, in both groups. However, the 
\ difference may reflect the technical bias of the respective passages 
rather than the psychological functions aimed at. t 
` B. A test in which the questions were not seen or answered until 
after the completion of the reading of the passages (i.e. involving 
“Tetention) appeared to measure а somewhat different ability from 
— the conventional immediate-comprehension test, and it was found 
бо be considerably more valid than the latter in the prediction of 
| Academic achievement. i 
6. The writing of responses to vocabulary and reading questions 
“by students in their own words (creative type) did not, as had been 
hypothesized, involve a different ability from the objective or mul- 
tiple-choice type of response. However the scores of British students 
On multiple-choice reading tests showed very high specificity, s 
\ flecting their unsophisticated and relatively unorganized approach 


to these tests. 


dd 


286 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


7. Centroid factor analyses revealed a strong Comprehension fac- 
tor, orthogonal to the Vocabulary factor, among both groups in 
the reading tests. The validity of this factor in predicting achieve- 
ment among American students is positive (though lower than that 
of Sentence Completion, Retentive Comprehension, and English Us- 
age kr Among British students its validity is slightly negative. 


l 
REFERENCES | 


Brownell, W. A., et al. The Measurement of Understanding. Forty- 
Fifth Yearbook of the National Society for the Study of Educa- 
tion. Chicago: University of Chicago Press, 1946. 

] » D. W. “Convergent and Discriminant . 

Validation by the Multitrait-Multimethod Matrix." Psycho- .. 


Horrocks, J. E. “The Relationship betw. 
Development and Ability to Use Su 
Applied Psychology, XXX (1946), 501—508. 


Н to the Ability to Read History and ‘ 
PY. Journal of Educational Research, XXXVI (1943), * 


Thorndike, R. L. Personnel Selection: Test and Measurement Tech- 
mques. New York: John Wiley & Sons, 1949. 

Thurstone, L. L. “Note on a Reanalysis of Davis's Reading Tests.” 
Psychometrika, XI (1946), 185-188. 

Vernon, P. Е. Educational Testing and Test-form Factors. Prince- 
ton, N. J.: Educational Testing Service, RB 58-3, 1958. 


ICATIONAL AND PSYCHOLOGICAL MEASUREMENT 
XXII, No. 2, 1962 


. RELATIONSHIPS AMONG RESPONSE SETS 
AND COGNITIVE BEHAVIORS* 


GARLIE A. FOREHAND 
The University of Chicago 


_ OoxczPrs of set, attitude, disposition, readiness, response prefer- 
есе and the like have long been prominent in psychological re- 
Search and theory, often serving in theories of general psychology 
hypothetical constructs to aid in predicting and generalizing re- 
lationships between stimulus and response (Gibson, 1941). It was 
- early recognized that individual differences in such variables might 
| introduce error into experiments, particularly into studies of percep- 
- tion based upon introspection (George, 1917). Comparatively re- 
cently, individual differences in веб have been studied in their own 
tight as variables of personality and differential behavior. Individ- 
ual variations in set have been studied primarily under two rubrics: 
ү esponse sets in test-taking behavior (Cronbach, 1946), and cogni- 
tive controls or “system-principles” in perception and cognition 
| (Gardner, Holzman, Klein, Linton & Spence, 1959). 

—— Cronbach defined a response set as “a tendency causing a person 
Consistently to give different responses to test items than he would 
ES the same content is presented in different form” (Cronbach, 
. 1946), Response sets which have been studied include acquiescence 
(the tendency to prefer positively worded alternatives), the ten- 
— dency to gamble, the tendency to strive for speed instead of ac- 
_ Curacy, interpretations of the inclusiveness called for in a response 


(Cronbach, 1946), preference for extremely worded alternatives 
E——CG b 

D2 This article is based on a dissertation submitted in partial fulfillment of 
; the requirements for the degree of Doctor of Philosophy at the Horeana А 
б Шіпојв in 1958. Special appreciation is expressed to Professor Lee J. Cronbach, 


inder whose direction the study was conducted. 
287 


288 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(Berg & Rappaport, 1954) and preference for middle category or 
“uncertain” responses (Rosenberg, Izard & Hollander, 1955) 2 Cron- 
bach’s definition suggests three types of questions for research re- 
garding manifestations of any particular response set. These con- 
cern (1) the response set’s reliability or the consistency with which 
it appears within a given test and upon repeated responses to the 
same test, (2) the generality with which the response set appears in 
different types of tests and items, and (3) the psychological mean- 
ing of the response set as a behavioral variable, i.e., its relationships 
with other variables and concepts. The evidence regarding the first 
issue is rather clear: response sets are, in gencral, reliable, both in 
an internal consistency sense and over time (Cronbach, 1950). Re- 
sults regarding the generality and psychological meaning of re- 
sponse sets are considerably less clear (Jackson & Messick, 1958). 
Among the stylistic tendencies in cognitive behavior which have 
been studied under the rubric “cognitive controls,” as summarized 
by Gardner and his associates (1959), are: consistencies in the in- 
clusiveness with which experiences are classified in conceptual 
categories (equivalence range); in the formation of schemata from 
temporally successive stimuli (leveling-sharpening) ; in the deploy- 
ment of attention to components of a stimulus field (focusing) ; and 
in the extent of selective response to relevant versus irrelevant cues 
under conditions of perceived ineongruity (field articulation). The 
questions for research posed by the definitions of these concepts are 
similar to those regarding response sets (Gardner, et al., 1959, pp. 
140-147). The possibility that response sets are manifestations of 
consistencies in cognitive organization has been suggested by Jack- 
son and Messick (1958). 
uu exploratory study reported here investigated (a) the validity 
01 certain generalizations of the cognitive control concepts of 
equivalence range and field articulation to behaviors in paper-and- 
pencil tests resembling behaviors described as manifestations of re- 


? Other “response biases” 
the tendency to deviate from a m 


GARLIE A. FOREHAND 289 


sponse sets, (b) relationships of several previously studied response 
sets to measures of cognitive controls, and (c) relationships among 
measures of response sets defined in different ways and in different 
tests. 


Procedures 
Variables? 

The variables used in this study are listed in Tables 1 and 2. It is 
possible to identify a number of analogies between the composition 
of tests in which response sets oceur and stimulus configurations in 
the presence of which cognitive controls appear to operate. A series 
of such analogies guided the selection of variables for this study. 
These and alternative hypothetical relationships between response 
sels and cognitive controls are considered below in the "results" 
section. 

Response set measures (see Table 1). This study employed four 
acquiescence scores, each involving a logical basis which has been 
previously used in the definition of such scores. Variable 1, the 
phony language exam (Nunnally & Husek, 1958), is composed of 
items whose content appears to be meaningful but is not. It was 
constructed by inserting, into English sentences, German words 
Whose meanings are obscure; subjects indicate agreement or dis- 
agreement with the guessed meaning of the sentences. The English 
Portions of the sentences imply relationships of frequency or causal- 
ity between the referents of the German words. The sentences are 
in the forms: “X is always (is always caused by) Y,” or “X is (is 
caused by) Y." (These are four of the eight types of items in the 
original scale; the present usage also differed from that of Nunnally 
and Husek in that two response alternatives, “agree” and “disagree 
supplanted their seven-point response continuum, and 24 items were 
added to the original 24.) * The Student Opinion Survey (variable 
2) was developed for this study in an effort to provide meaningful, 
but more-or-less random content. It contains 33 statements pertain- 
ing to “college life,” including favorable and unfavorably worded 
statements in approximately equal numbers. The items were selected 
80 that agreements in general would reflect neither a positive nor 


ت 
"The variables are described in detail elsewhere (Forehand, 1958. —‏ 
‘Appreciation is ed to Professor J. C. Nunnally for help in modifying‏ 


ы instrume nt. 


x 


290. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


` negative attitude toward the subject matter. In variable 3, acquies- 


cence is manifested by agreement with statements contradicting one 
another: the items of the California F-scale (Adorno, et al. 1950) 
and the “reversed F-scale" items developed by Jackson and Messick 
(1957). The Information-true test (variable 4) contains 60 informa- 
lion items whose content is so diffieult and ambiguous that, upon 
pretesting, a 50-50 split of true and false responses was obtained 
(Gage, Leavitt & Stone, 1957). Interspersed with these "difficult" 
items are 40 “easy” items which were passed by 75 per cent of sub- 
jects in the pretest, and which are not scored in the present study. | 
The acquiescence score is the number of “true” responses to the 
“difficult” items. 

It is possible that acquiescence is one form of a more general 
Tesponse set to choose an alternative and stick to it. The phony 
language exam and the Information-true test were therefore scored 
for a perseverative tendency (variables 11 and 12). The persevera- ` 
lion score is the number of responses in the subject's own modal 
response category. 

The tendency to choose extreme alternatives was observed in two 
instruments eliciting affective reactions to stimuli. In the Perceptual 
Reaction Test (Berg, Hunt & Barnes, 1949; Berg & Collier, 1953), 
the stimuli are 60 two-dimensional one-or-two color abstract de- 
signs. In the Activities Preference Checklist, developed for this 
study, the stimuli are 50 leisure activities (e.g. swimming, listening 
to classical music, working erossword puzzles). In both cases, the 
response alternatives are “like mueh,” “like slightly,” “dislike 
slightly,” ‘dislike much.” These tests were scored for extreme posi- 
tive response tendency (number of “like much” responses: variables 
5 and 6), extreme negative response tendency (number of “dislike 
much” responses: variables 7 and 8), and general extreme response 
tendency (number of “like much” and “dislike much” responses: 
variables 9 and 10). 

Measures based on cognitive control concepts (see Table 2). The 
equivalence range concept refers to variations in “subjective criteria 


used to categorize experiences,” on a continuum ranging from “rela- 


tively relaxed and inclusive criteria of similarity” (broad equiva- 


“. * a n H H . 
lence range) to “more exacting criteria of similarity" (narrow 


equivalence range) (Gardner, ef al., pp. 105—107). Object sorting. 
(variable 13) has been used as a crite 


range in several studies (Gardner, 1954 


rion measure of equivalence 
; Gardner, et al., 1959). The 


GARLIE A. FOREHAND E 


P 


stimulus materials are 73 objects of varying familiarity. The sub 
ject’s task is to “put together into groups the objects which seem to 
you to belong together.” No other criteria for grouping are given. 
The objects, instructions and method of administration duplicated 
those described by Gardner, e£ al. A large number of groups is in- 
terpreted as indicating narrow equivalence range. The judged 
equivalence of phrases test (variable 14), developed specifically for 
this study, consists of 12 sentences, each with an underlined phrase. 
Each sentence is accompanied by six phrases, and subjects are 
asked to mark all phrases “which ean be substituted for the under- 
lined phrase without changing the essential meaning of the sen- 
tence.” This variable is included as a proposed measure of equiva- 
lence range since performing the task can be seen as requiring 
equivalence judgments on phrases, and hence on sentences. Persons 
with broad equivalence ranges would be expected to mark many 
phrases, ie., to report that many phrases are equivalent to the 
originals. The score is the total number of phrases marked on all 12 
sentences (maximum, 72). Estimation of proportions (variable 15) 
is an adaptation of Brim’s “desire for certainty” instrument (Brim 
and Hoff, 1957). The 32 items are typified by: “The chances that an 
American citizen will believe in God are about in 100.” The 
subject's task is to estimate the appropriate percentages. This task 
шау be conceived as one of estimating the size of an equivalence 
group. A subject with a broad equivalence range may be expected 
to estimate extreme proportions on the test, since either an ex- 
tremely high or extremely low estimate effectively places a large 
number of objects into a single category. A subject's score is ob- 
tained by subtracting his estimate from 50 and averaging the abso- 
lute value of these deviations over the 32 items. 


. The concept of field articulation is a synthesis of Klein's flexible 


Versus restricted control hypothesis, which describes variations in 
reactions to contradictory or intrusive cues (Klein, 1954), and 
Witkin’s concept of field-dependence versus field-independence, 
which is defined by individual differences in the reliance on external 
(eg. visual) in preference to internal (e.g., bodily) E and in the 
ability to extract an “item” from a field in which it appears 
(Witkin, e aL, 1951). Measures of these two concepts defined а 
Single factor in the study by Gardner, et al. (1959).5 The factor was 


i d female 
* Gardner and his associates analyzed responses from their male an 
amples separately, This г was obtained in their female sample. 


292 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


labeled “field articulation,” and was characterized by ability to 
sort out and attend to relevant cues in a field perceived as contain- 
ing incongruous cues. 

The color-word interference test (variable 16) was used by 
Gardner, et al. (1959) as a measure of field articulation, and by 
Klein (1954) as a measure of constricted versus flexible control. 
The form of the test used in this study was described in detail by 
Thurstone (1944). There are three parts to the test. In each part 
the subjects are asked to perform the task as quickly and accurately 
as possible. The tasks are: (1) reading aloud a page of 100 color 
names, randomly arranged in 10 lines of 10 words each; (2) naming 
aloud the colors of 100 colored spots arranged randomly in 10 lines 
of 10 spots each; and (3) naming aloud the colors in which 100 
words are printed, the words being incongruent color names. The 
first task is used only as a “warm up” and to detect reading 
anomalies. The remaining parts of the test provide measures of 
speed of recognizing and naming colors under two conditions: with 
and without interference from incongruous cues. An “interference 
score” is defined as the difference between actual reading time under 
the interference condition and the time which would be predicted 
from reading time under the non-interference condition on the basis 
of regression. Embedded figures tests (variable 17) have been used 
by Witkin and his associates (1951) as a measure of field de- 
pendence versus field independence, and by Gardner, et al. (1959) 
as a measure of field articulation. The present study employed 
Thurstone’s experimental version of this test (1944), while the 
above-mentioned investigators used Witkin's “more dificult” colored 
version (1950). In either case, effective performance requires that 
subjects overcome the distracting effects of the extra lines in the 
complex figures. The word-spacing test (variable 18) consists of 
three paragraphs, the letters of which occur in the correct order 
but with the spaces incorrectly arranged. The subject’s task is to 
indicate the correct placement, of spaces. The score is the number 
of correct minus the number of incorrect placements during the al- 
lotted time. In this test, the tendency to attend to groups of letters 


The correlation of any other 


variable wi Ж interfe! 
score was found algebraically, е ith the color-word interference 


з by computing the i 

. $ 1 в part correlation of the 

not created wih nding ne (tgo tene oondition) which i 
e (non-i iti slati 

between the two reading times was 61. interference condition). The correlation 


| 


GARLIE A. FOREHAND 293 


as words is distracting; the subject must check this tendency and 
supply the proper structure in the absence of conventional struc- 
tural cues. Effective performance of this task, although undoubtedly 
dependent to a large degree on purely verbal factors, would appear 
to be facilitated by field articulation. 


Subjects 


Ninety-six subjects, all of them students in 1958 summer classes 
at the University of Illinois, participated in the study. The majority 
(69) were volunteers from graduate and advanced undergraduate 
classes in Education; the remaining 27 were students of elementary 
psychology who participated in fulfillment of a course requirement. 
There were 51 male and 45 female subjects. 

The tests were administered in two sessions, one individual and 
one group. During the individual session, the object sorting and 
color-word interference tests were administered. The remaining tests 
were given in group sessions, lasting approximately one and one- 
half hours, 


Results 

Reliability and Generality of Response Sets 

Internal consistency reliability coefficients and product moment 
intercorrelations among response set measures are presented in 
Table 1. Reliabilities are, in general, substantial, a result consistent 
with many other studies. n 

It will be noted that tendencies to respond with extreme positive 
Tésponses, with extreme negative responses, or with extreme ji 
‘Ponses generally as opposed to “like slightly” or “dislike slightly 
responses on the Perceptual Reaction Test (PRT) are significantly 
elated to similarly defined tendencies in the Activities Preference 
Checklist. The two measures of perseveration are not significantly 
elated, but both have significant negative correlations with general 
xtreme response tendency in the PRT. It will be noted from the 
definition of the perseveration scores that the scores will be smallest 
When the two alternatives are endorsed approximately an equal 
number of times; thus, decreasing scores reflect increasing iis 
Of response, In the PRT the present, subjects tended to avoid ex- 

eme responses; the mean number of extreme responses 18 17.0, эз 
‘ompared with an expected 30 if responses had been evenly dis- 


294 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


wc 2 


س 


BEN IL 


900: — 
РГ 
9594” 


or 


«VEG — 
+4693 — 


++91С` 


9883" 


EST — 
TOrT— 


(eee) 


TOT 


с8_— 
107 — 


9er 


(0c9) 


200° 
сю— 
(ez) 


+086 


(zeg'—) 


6£0°— 


«084 


9 


ILF— (621) 


+599’ — 


+896 


(262°) 


090'— 


(207) 


24908" 


4678" 


800`— 
660 — 


880'— 


120° 


90 — 
(#`—) 


le 


TOL 
90 — 
«98 


JIL 9nijp-uonvwur 
-10ju[ :uorjeieAossISq "CI 
шх o3unsuvT 
Auoyg :uorjvraAogaq "IT 
WPI 
santanoy :osuods 


epsos-q рәвдәләл рит 
әүвов- :әопәоѕәтһоәу "е 


AKoaıng uodo 


3uepnjg :eoueosomboy g 


шх oSunZuvT 
Auoyg :eouoosamboy "T 


S 


F 


E [4 I 


HEA 


saunspa yy 29g asuodsay [0 suoijvja.100402uT 
T G' IH VIL 


GARLIE A. FOREHAND 295 


tributed over the four alternatives. Those persons who more often 
endorsed extreme alternatives were, in this instance, those whose 
responses were more evenly distributed. It might be suggested that 
à tendency to spread responses evenly over the available alterna- 
tives accounts for the relationship between the perseveration and 
the extreme response tendency scores. 

Such a tendency would not account for the relationship between 
extreme responses to the PRT and to the Activities Checklist, how- 
ever. In the latter instrument, the mean number of extreme re- 
sponses was 22.5, as compared with an expectation of 25 if the re- 
sponses were evenly distributed. It would appear that there are 
two “factors” in the tendency to give extreme responses to the PRT: 
a willingness to choose extremes in general, and a tendency to 
equalize the distribution of responses among the available alterna- 
tives. 

With the exception of the correlation of .39 between acquiescence 
on the F- and reversed F-seales and on the Student Opinion Survey, 
the measures of acquiescence are not significantly interrelated. The 
exception cannot be interpreted solely in terms of acquiescent re- 
sponse set: agreement with items in the Student Opinion Survey 
correlates .54 with the original F-scale and —.01 with the reversed 
F-seale, while the latter two measures are correlated .04. It would 
appear that content factors have “erept in" to obscure the operation 
of response set! The results of Gage, et al. (1957), which indicate 
that positively-worded “authoritarian” items were better predictors 
of observer-rated authoritarianism than were negatively-worded 

“items, would support an argument that the Student Opinion Survey 
is biased with unforeseen authoritarian content. 


i Relationships Among Cognitive Control Measures 


Intercorrelations and reliability estimates of the measures based 
‘pon cognitive control concepts are presented in Table 2. It will be 
noted that the variables 1, 2, and 3, which purported to tap ten- 
dencies in the inclusiveness with which equivalence judgments are 
Made, are not significantly intercorrelated. The lack of, correlation 

tween color-word interference and embedded figures scores con- 
trasts with the results obtained by Gardner, et al. in their female 
‘ample. (Tt should be recalled that black-and-white embedded fig- 
wes stimuli were used in the present study while colored stimuli 


296 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 
Intercorrelations Among Cognitive Control Measures 


Variable (=)13 , 14 Ig ув .17 118 


( —)13. Object Sorting (number 
of groups) = 
14. Judged Equivalence of 
Phrases .143 .9225 
15. Estimation of Proportions .087  —.032 .414> 


(—)16. Color-Word Interference 


Seore 063  —.013 .192 = 
17. Embedded Figures Test -071 102 .111  .062 ye 
18. Word-Spacing Test .023 .038 .030 .211* .349** 705 


Note: Internal consistency reliability estimates are in the main diagonal. Signs for variables 
13 and 16 are reversed, so that a relationship in the predicted direction is indicated by 
& positive coefficient, 
* Significant at the .05 level 
* Significant at the .01 level (two sided tests) 
b Split half (corrected) reliability 
* Reliability data not available 
4 Mean intercorrelation of the three parts 
were used by Gardner and his associates). Scores on the word 
spacing test are significantly related to performance in both the 


color-word interference test and the embedded figures test, as pre- , 
dicted. 


Relationships Between Response Sets and 
Cognitive Control Measures 


Some relationships between response sets and cognitive control 
measures may be predicted on the basis of analogies between the 
two types of concepts. It has been hypothesized that acquiescence 
to items containing such phrases as “every person,” “no person,” 
“all,” “most important,” and so forth, reflect a tendency to over- 
generalize (Jackson and Messick, 1957). It may be predicted that 
agreement with categorical statements—exemplified by variables 
1, 2, and 3 in the present study—would be related to width of 
equivalence range since it implies relaxed criteria for categorizing. 
The correlations in Table 3, however, fail to support this prediction. 

Selection of a large number of extreme alternatives might also be 
a manifestation of relaxed criteria for categorizing. This interpre- 


T Thurstone (1944), using black-and-white ried 
а near-zero correlation between the two tests, е" 


(e824 POPIS Z) үолә 10° oq 49 9uvograS tS. PAS] GO" ON} ут yarog + +рәвләләл әтә QT PUT LT вәүдзїыА 203 BUTI :930N. 


ee ee uM кыск нылыт сз с щш. 


«103° — 89т`— S80’ 280: 690° 201° — 989L ona[-uorjwur 
-1ojyu[ :uorys1eAossioq “ZT 
$20" 800° 280: Кайы est" A= urexg ozwnzuwvT 
ÁAuogq :чогувдәләвдә "IT 
FST 680° — 490° — +37% 800`— yor" pqg SHTOV 
asuodsal aurer}xe үвзәцәгу “OT 
c£0 Тет тет сот 10 SIO — эвәд, uonovey qunidoo1aq 
әзпойвәл euro jxo [wieuor) '6 
£60" — €80°— H0: — +006 з= 920 — WPP вәгилтүәү 
:9suodsei eArjvSou swap '8 
8SI:— 870 — 920° — 980 690: — ото эвәд, uonovoy [unjdaozaq 
resuodsal әлүук®әп ouranx “L 
921 890 — 820 — FET 610 9ST 181ppoe qr) SONIA 
E 9esuodser eArjmod ouranx;p "9 
2 cor" Lr I6r 890° oor” 080° — ISAL uonovey [enjdooxoq 
:esuodsoi әлтувой әшәлух 'g 
a 190° — т80` 0&0" ат” Tc0* TOU зә, опу люеш 
= -10juJ :o0uo9somboy "y 
d x46 — xe — › 290° — 020° SFO’ ©80` өгеоө-д рәзюләл pus 
zs ayvos-q :99uaosomboy ^g 
© 


¥80°— SET ~— 490° FSO: £90" 021` ÁaAmg uodo 


3uepnjg :eousosomboy “| 
SIO — ger — 0c0* ©$0` 680~ 690 — шахд o3unSuwT 
Kuoqq :sousosamboy 'I 
Зшәва$ səm 99uo1oj1ojup 51013101024 jo seswiqq ЈО 2013108 sonsvayy 39g esuodsoj 


ром “ST рәррәдш ZI ріом-20100 `910—) wonwumsq'er eousfeamby ‘FI 309190 “EI(—) 
вәлпєтәрү [013100) eArjruso;) 


sainspa рү 1042u0;) aarpubog pun sag osuodsog usomjog S1009400 


© WISv.rL 


b n RGA 


298 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 
d 


tation is supported by the relationships of extreme negative an 
extreme general response tendencies in the Activities Preference - 
Checklist, with extremeness of proportions in the estimation of pro- 
portions task. These relationships are consistent with results re- 
ported by Brim and Hoff (1957) and with their hypothesis that both 
extreme proportions and extreme response tendencies reflect a “de- 
sire for certainty.” Since other measures of extreme response ten- ` 
dency and categorizing behavior are not interrelated, the latter in- 
terpretation is suggested as the more tenable one. 

The meaninglessness and ambiguity of the items in the phony _ 
language exam and the Information-true test—items apparently 
calling for an objective content response—are in some ways 
analogous to the type of stimulus situation in which field articula- 
tion appears: an ambiguous field in response to which the subject 
must sort out and attend to relevant cues, In this type of situation | 
“low articulators” are described as likely to organize stimuli along 
the simplest possible lines, resolve incongruity by responding only | 
to the most compelling elements and ignoring others, and, when 
such resolution is impossible, respond in ways indieating disruption 
of performance. It is plausible that these subjects might respond to 
unstructured test items in terms of a perseverative response set. 

The relationship of —.20 between performanee on the word- 
spacing test and perseveration in the Information-true test tends to 
support this interpretation. Again, however, this generalization is 
ш verified by relationships among other measures of similar 
variables, An alternative interpretation is suggested in the discus- 
sion section. 

Agreement to the combined F- and reversed F-scales is signifi- 
cantly negatively correlated with performance on the embedded | 
Jer No word-spacing tests: — 25 and — 34, respectively. The 

3 en separately correlates —.12 and —.13 with these vari- 
ables, and the reversed F-scale correlates —.10 and —.18 with them, 
indicating that in this instance set, rather than reaction to content 


accounts for the relationships. Some implications of these findings 
are discussed below. 


Perhaps the most striking result, of this study is the specificity 


A 
{ 
Discussion 
of the measures of acquiescence, Acquiescence is often spoken of as 
| 
d 
4 


GARLIE A. FOREHAND 299 


if it were a general trait, and many studies of correlates of acquies- 
cence have simply assumed this to be the case. The reliability and 
intra-test consistency of acquiescence, which has often been demon- 
strated, indicates that acquiescence in a given test may be studied 
as a construct. The results of the present study, however, suggest 
restraint in generalizing the concept as a common trait to describe 
response tendencies, albeit defined by similar overt behaviors, in 
different instruments. The same can be said for the response perse- 
veration concept. 

Measures of extreme response tendencies in two different instru- 
ments tended to be interrelated in this study. The instruments in 
which these response sets were observed both called for affective 
responses to stimuli, i.e., response in terms of “liking.” Whether this 
generality would extend to other kinds of extreme responses (e.g, 
in agreement or disagreement with attitude items, acceptance or re- 
jection of self-deseriptive statements) is a matter for further re- 
search; the results with respect to acquiescence and perseveration 
suggest that the generality not be taken for granted. Research on 
the conditions under which response sets in one instrument are re- 
lated to similarly defined behaviors in another instrument, would 
seem to be essential to further study of the psychological meaning 
of response sets. Among the conditions which might be profitably 
investigated are the form of the items, the nature of the content, 
and the motivational context of the testing situation. 

This study failed to support the generalization of the concept of 
equivalence range to predict relationships among responses which 
might be based upon categorization of experiences—responses to the 
object-sorting, equivalence of phrases, and estimation of proportions 
tasks. Other studies, discussed by Gardner, e£ al. (1959) have been 
reported more successful generalizations of the equivalence range 
concept. The variables which have correlated with criterion meas- 
Utes of equivalence range (usually object-sorting) are generally of 
* more strictly pereeptual nature than those used in the d 
study. It is possible that such factors as knowledge and verba 
ability influence the behaviors observed in the present study to such 
ад extent that their relationship with equivalence range 18 осе. 
Further research is necessary to determine whether this is the na 
Or Whether the equivalence range concept can not be generalize 
beyond limited domains of behavior. 


30 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The concept of field articulation was the basis for predicting re- 
lationships among responses to stimulus eonfigurations containing 
ambiguous and competing cues—in the color-word interference, em- 
bedded figures and word-spacing tests. The evidence regarding this 
interpretation is ambiguous: word-spacing test performance corre- 
lated significantly with each of the other two measures, while the 
latter were not correlated with one another. If these results should 
be replicated in further research, a multi-factor interpretation of the 
word-spacing test would be suggested. In any event, word-spacing 
tests and perhaps similar measures using verbal stimuli appear to 
be promising variables for research on cognitive behavior. 

Jackson and Messick (1958) hypothesized that response sets (or 
response styles in the terminology they prefer) are manifestations 
of general stylistic tendencies which would also be manifested in 
measures of perception and cognition, Because of this study’s neces- 
sarily limited sampling of the domain of cognitive behaviors and 
the absence of theoretical grounds for predicting specific relation- 
ships, it is difficult to assess the implications of the present results 
for this hypothesis. Several interpretations of response sets in terms 
of equivalence range and field articulation were proposed. Although 
some of the results would tend to confirm these interpretations, they 
are scattered and near the borderline of statistical significance; the 
clustering of variables which would be expected from these inter- 


variables—including performance on the embedded figures and 


word-spacing variables, acquiescence in the F- and reversed F-scales, 
and possibly persever 


further study. A Study by 


gests that acquiescence on the F- and reversed F-scales is related to 


that rigidity, or inability to shift 
s among these variables may be 


Summary 

The intercorrelations amon, 
measures based on conce 
Gardner, and others wer 


g 12 measures of response set and six 
pis of cognitive control proposed by Klein, 
© examined. The response set variables in- 


GARLIE A. FOREHAND 301 


sponse tendency, extreme negative response tendency, and general 
extreme response tendency, each in two different tests. Three vari- 
ables were based on the cognitive control concept of equivalence 
range—the breadth of the range of experiences one is willing to 
classify as being equivalent: inclusiveness of categorizing in an un- 
structured object-sorting situation; willingness to accept verbal 
phrases as equivalent in meaning to a given phrase; and extreme- 
ness of estimated proportions of persons or things having a common 
characteristic. Three variables were based on the concept of field 
articulation—the tendency to respond selectively to stimulus con- 
figurations perceived as containing competing or contradictory cues: 
ability to respond to task-relevant cues in the presence of incon- 
gruous cues in a color-word interference test; performance in an 
embedded figures test; and ability to reproduce a meaningful verbal 
structure from paragraphs, the spacing of which was incorrect. 

Extreme response tendencies in one instrument were significantly 
related to similarly defined tendencies in another instrument. Meas- 
ures of acquiescence and perseveration were not, in general, inter- 
related. Variables based on the equivalence range concept were not 
significantly intercorrelated. Two of the three intercorrelations 
among field articulation measures were statistically significant. Re- 
lationships between response sets and cognitive variables were ex- 
amined, and their implieations for an interpretation of response sets 
as manifestations of stylistic tendencies also found in cognitive be- 
haviors were discussed. 


REFERENCES 


Adorno, T. W., Frenkel-Brunswik, Else, Levinson, B. J. and San- 
ford, R. N. The Authoritarian Personality. New York: Harper 


and Brothers, 1950. s m = 
erg, I. A. “Response Bias and Personality: The Deviation Hy 


2 -72. 
pothesis.” Journal o. Psychology, XL (1955), 61-7 Rapa 
Berg, I. A. and Collier Joanne S. “Personality and сара 
ences in Extreme Response ет Irc yeu ANDES 
CAL MEASUREMENT, XIII (1953), : 
Berg, 1. A. Hunt, W. A., and Barnes, E. Н. The Perceptual Rete” 
tion Test. Baton Rouge, Louisiana: Louisiana State Un! 


Press, 1949, EAN: structured 

Berg, I. A. and Rapaport, C. M. “Response Bias in an Unstru 
Questionnaire." ЛОН of Psychology, XXXVIII (1954), 
481 


Brim, O. G. and Hoff, D. B. “Individual and Situational Differences 


300 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in Desire for Certainty." Journal of Abnormal and Social Psy- 


chology, LIV (1957), 225-299, E 
Cronbach, L. J. "Response Sets and Test Validity." EDUCATIONAL 
AND PSYCHOLOGICAL MEASUREMENT, VI ( 1946), 475-494, 
Cronbach, L. J. *Further Evidence on Response Sets and Test De- 
Sign." EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, X 
(1950), 3-31. 
Forehand, G. A. "Cognitive Correlates of Response Style." Unpub- 
lished Ph.D. thesis, University of Illinois, 1958. A 
Gage, N. L., Leavitt, G. S. and Stone, G. C. “The Psychological 


Meaning of Acquiescence Set for Authoritarianism." Journal of | 


Abnormal and Social Psychology, LV ( 1957), 98-104. 

Gardner, R., Holzman, P. S, Klein, G. S., Linton, Harriet, and 
Spence, D. P. "Cognitive Control: A Study of Individual Con- 
Apres in Cognitive Behavior." Psychological Issues, I (1959), 

0. 4. 
George, 8. S. "Attitude in Relation to the Psychophysical Judg- 


__ment.” American Journal of Psychology, XXVIII ( 1917), 1-37. 
Gibson, J. J. “A Critical Review of the Concept of Set in Con- 


Jackson, D. N. and Messick, S. J. “A Note on Ethnocentrism and 
equiescent Response Sets,” 


Journal of Abnormal and Social 
уй, LIV ( 1957), 132-134. 


Sokagi; - N. and Messick’ 8, J, “Content and Style in Personality 


sessment.” Psychological Bul ti -252. 
jade, DN, ychotogical Bulle n, LV (1958), 243-259 


Messick, S. J., and Solley, C. N. “How ‘Rigid’ Is 
the ‘Authoritarian’?” Journal of Abnormal an ial Psychol- 
ogy, LIV (1957), 137-141. i ee” Pe 
ein, G. 8. “Need and Regulation.” In Jones, M. B. (Editor), 
Nebraska Symposium on Motivation, Lincoln, Nebraska: Uni- 
versity of Nebraska Press, 1954, 224-280, 

Nunnally, J. C. and Husek, T. В. “The Phony Language Examina- 
tion: An Approach to the Measurement of Response Bias.” Epu- 
nl AND PSYCHOLOGICAL MzasumEMENT, XVIII (1958), 


peres N., таті Сапе, and Hollander, E. P, “Middle Cate- 
sponse Reliabilit i i i 
чама. vane ty and Relationship to Personality and 


JUCAT: j 
vini ا‎ (1908). me TONAL AND PSYCHOLOGICAL MEAS 
erie me Factorial Study 

onographs, No. 4. Chicago: Uni 
Wiüis wren Талы Dig university of Chicago Press, 1944. 
Wen En Tournal of Personality, XIX (1950), 1-15. 
» H.A, is, H. B., is- 
snor, P.B. and Mopar A ertmann, M., Machover, K., Meis 


. Personalit х 
York: Harper and Brothers, 1951. U Through Perception, New 


dc EM CC‏ غ 


| 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 2, 1962 


SOME PROBLEMS IN THE DESCRIPTION OF CHANGE' 


CHESTER W. HARRIS 


Department of Educational Psychology 
University of Wisconsin 


Tum purpose of this discussion is to explore several possibilities 
lor treating change over time in a set of attributes. For the most 
part, we shall assume that the “same” set of persons has been meas- 
ured, in some fashion, on two separate occasions, and that this 
change over time is to be described. Such change may be called 
“growth” or “development” if one wishes; we shall use the fairly 
neutral term. If only one attribute has been measured on each of 
the two occasions, we shall call this a univariate problem. If more 
than one attribute has been measured on the two occasions, we 
shall call the problem multivariate. For the univariate problem, 
crude gain, standard score gain, or residual gain might be exam- 
ined. Problems such as the reliability of difference scores arise here. 
In addition to the work of DuBois (1957), papers by Lord (1956, 


. 1958) and McNemar (1958) are relevant to the univariate case. 


Our interest is in the multivariate case. We begin by making 
some restrictions that may appear to be rather drastic. We as- 
sume a set of measures for V individuals at time 1 and a second, 
analogous, comparable, or identical set of measures at time а vs 
the "same" individuals. We also assume that the “treatmen of 
the N individuals during this period from time 1 to time 2 can 
be described. (A training program, psychotherapy, ete., illustrate 


treatments.) cron 
But we shall also assume that only one treatment has been in- 


Ee R 
1Paper prepared for Conference on Resear 
Washington University, April 14 and 15, 1961. 


303 


ch Methodology in Training, 


304 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


troduced and that we are now concerned with describing the 
change that has occurred under these conditions. This is the kind 
of problem faced by the psychiatrist or the teacher when he at- 
tempts to describe in what ways a group of patients or a class of 
students have changed during the period of his tutelage. As such, 
it is a familiar and realistic problem. 

This limited objective should not obscure the point that an ex- 
perimental study of the relative effectiveness of various treatments 
is also an important problem. Such studies can be handled very 
well today if only a single dependent, variable (criterion) is in- 
volved. Treatment, the independent variable, can be considered 
either as a fixed or a random effect, and the appropriate confidence 
intervals for the means of the dependent (criterion) variable can 
be developed. The design can be elaborated by introducing strati- 
fying variables (such as Sex, or social status, or previous experi- 
ential pattern), and tests of interactions of these with the treat- 
ment variable can be made. Furthermore, status at time 1 in one 


or more measures, including a “pre-test” of the criterion variable, 


can be utilized, in the analysis of covariance, to introduce a sta- 
tistical control that is much to be preferred to an attempt to 
“match” cases, All this is certainly to the good. However, multiple 
dependent variables still remain something of a problem. One can 
test for “size” and “shape,” though the test of the latter may be 
sticky” as Greenhouse and Geiser (1959), for example, indicate. 
For this reason, we wish to explore the multivariate descriptive 
problem in some detail, outlining Possibilities and potential “booby 


CHESTER W. HARRIS 305 


2. It is now evident that Y, — Y; is the matrix of crude gains. It 
is well-known (though occasionally forgotten in practice) that for 
any individual his erude gain expressed as a deviation score (from 
the mean of the erude gains) is simply the difference between his 
deviation scores on the two occasions. 

Let us develop this simple notion rather generally. For any N 
we can define a symmetrie matrix L with (N — 1)/N in the main 
diagonal and —1/N elsewhere. This matrix L is such that 


YL = |lyll, 


with ||y|| symbolizing deviation scores. Let us illustrate. Suppose Y 
consists of one row and four columns, with the entries: 5, 8, 3, 
and 2. Then 


15832111 3/4 —1/4 —1/4 —1/4 
—1/4 3/4 —1/4 —1/4 
-1/4 -1/4 8/4 —1/4 
—1/4 -1/4 —1/4 8/4 

= ||+0.5 +3.5 —1.5 —2.5]| 


or YL = ||у||. Knowing N, we can define L unambiguously 80 as to 
make this equation true for one-rowed Y of N elements; since it 
holds for any row of such a Y, it holds for as many rows of Y as 


One wishes to introduce. Es 
Now, Y,L and YaL give deviation scores for occasion 1 an 


casion 2. Therefore, since matrix algebra is associative, we have 


Y,L — Y,L = (Y — Y)L, 


or the generalization that deviated crude gains equal the differ- 
ence between the deviated raw scores on the two occasions. 
Another operation that we will wish to employ is that of chang- 
ing the scale or variance of one or more rows of Y. For example, 
one might wish to turn crude gains into standard scores by mul- 
tiplying the deviated crude gains by a constant to make the sum 
of squares of the transformed gains equal to № for each measure. 
Such a transformation occurs when we pre-multiply (Ys — YL 
by a diagonal matrix, the non-zero elements of which are the те- 
ciprocals of the “classic” standard deviations of the crude gains. 
We would then have D,(Ys — YIL. Note that this differs from 


306 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(D;Ys — D,Y;)L, which is the expression for the differences be- 
tween standard scores on the two occasions. It would be possible 
to have a ease in which the variances of the corresponding rows 
of Үз and Y; are identical, ie. De = Юу, but this would not in- 
sure that D, = D, = D. In order to show this, and to demon- 
strate a general method for arriving at these diagonal matrices, 
we will work up a matrix that is proportional to the variance- 
covariance matrix. 

First we need to observe that the matrix L is highly specialized. 
For one, it sums to zero all ways; this, of course, is related to the 
fact that deviation scores sum to zero. For another, it is idempo- 
tent, that is, L = LP, This can readily be verified by direct mul- 
tiplication. This can also be seen to be true when we consider that 
L deviates any row of N elements from its mean; the effect, then, 
of post-multiplying L by L is to deviate each row of L from its 
mean. But the sum of any row of L is zero, and consequently the 
mean of such a row is zero. Therefore, multiplying L by L leaves 
L unchanged. For any positive integer p, L” = L. (We restrict 
D to positive integers since L is singular and has no inverse.) We 
IU GR pointed out that L is symmetrie, and therefore 

We can generate the numerators of the variances and covari- 
ances of any set of data Y, by operating on the deviation scores. 
The numerator of a variance is given by squaring deviation scores 


and summing; of a covariance, by summing the cross-products of 


deviation scores, This can be expre i ipli 
à a ssed t ч 
tion: YLI/Y' = үгү, ue ame 


mr А because L is symmetric and idempotent. 
.. We then wish to determine the matrix D that transforms YL 
into standard scores, we take for each non-zero element of D the 
square root of the quantity N divided by the relevant diagonal 
element in YLY'. Pre- and post-multiplying YLY’ by this diago- 
nal matrix D would yield a symmetric matrix with N in the main 
diagonal and N times the correlation coefficient, r, in the off- 
diagonals, Had we simply normalized the matrix YLY’, we would 
have employed as the diagonal matrix one whose non-zero ele- 
ments are the reciprocals of the Square roots of the diagonal ele- 
ments of YLY’. The normalized YLY’ is a matrix of correlations 
with unit variances in the diagonal. These two forms (DYLY’D 
and the normalized YLY' which we will write as R) differ only 


CHESTER W. HARRIS 307 
by a scalar and thus are equivalent for Many purposes. 
Now we can look at the diagonal matrices that give us gains 


expressed as standard scores, and the diagonal matrix that gives 
differences between standard scores. In the first case, we need 


(Y; — ҮҮ, — Yi) = ҮҮ, + ҮҮ, — ҮПҮ, — ҮПҮ. 
We might write these four elements as a supermatrix: 

ҮШҮ» ҮҮ; 
Y,LY, VUY 


and note that in order to find D, which transforms deviated gains 
into standard scores, we first add together the ith diagonal elements 
of Y;LY, and Y,LY,’ and subtract the corresponding ith diagonal 
elements of YiLY; and Y4LY,'. The square root of N divided by 
this number is the ith diagonal element of D,. In the second case, 
however, we build D, from N and the diagonal elements of Y;LY;' 
only, and D, from N and the diagonal elements of Y,LY,' only. 
Clearly, then, even though the variances of corresponding rows of 
Y, and Y, were equal, i.e., D, = D, the non-zero elements of D, 
will differ since they are also a function of the diagonal elements of 
Y,LY,' and УУ, or, fundamentally, the correlations across the 
two occasions. 

In addition to erude gains and gains taken either as standard 
Scores or as differences of standard scores on the two occasions, 
residual gains might be employed. These may be defined as (DY: = 
D.D,Y,) L, with the understanding that D, designates a agonal 
matrix of the correlations across the two occasions for the “same 
measure. What has been done here is to subtract from the standard 
Score on occasion 2, the standard score on occasion 1 weighted by 
the correlation of this measure over the two occasions. If we post- 
multiply (D,Y, — D,D,Yi)L by its transpose and again display 
the four elements as a supermatrix, we get: 

D,Y,LY/D, РҮҮ, | (2) 
D,D,Y,LY,D; Р, ҮҮР), 
The form (2) is seen to be a new weighting of (1). 
Let us summarize these measures of gain: 
Y,- Y, 
(Ys TY Y,)L 


(1) 


1. erude gain 
2. deviated crude gain 


308 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMEN ji 


3. standard score gain D,(Y, – Y,)L 
4. differences in standard score (Р,Ү, — D,Y,)L 
5. residual gain (DY — D,D,Y,)L 


Certain similarities and differences among these five measures are 
evident. In particular, if we ask how these five types of measures 
might differ in their correlation with some variable not included 
in the Ys, Y; system, we would note that the first three (crude gain, 
deviated erude gain, and standard Score gain) would have iden- 
tical correlations with this outside measure for the same pre- and 
post-measure. Number 4, differences in standard score, correlates 
differently with this outside measure unless D = D, or the va- 
riances of each pair of pre- and post-measures are identical. The 
fifth type differs from the fourth in the weighting D, applied to 
1Yi. If D, were the identity matrix (perfect correlation between 
Ys and Y), this would appear to reduce to the fourth type; how- 
ever, with perfect correlation the standard scores are identical and 
the fourth type reduces to а null matrix. The case of D,D, = D» 
also seems to be a possibility, but the necessary restriction is that 
the sum of squares of the deviated post-measure be equal to the 
sum of cross-products of the deviated pre- and post-measures; 
this can occur (for real numbers) only when Y;L = YiL, and 
We have a matrix of differences that have zero variances. Since 
perfect correlation over the two occasions is seldom to be ex- 
pected, these special cases are trivial in practice. 
ake explicit weighting sys- 
tems that enter into the third, fourth, and fifth types of meas- 


is implicit in any of these Measures of gain; this is true for “let- 
ting the variables weight themselves” as well as for deliberately 
chosen weighting systems, The weighting system affects to some 
degree the correlations of such Measures of gain with outside 
measures, and affects them in 4n unpredictable fashion. The de- 


CHESTER W. HARRIS rg. 


cision to employ a measure of gain should be made only with 
the realization that a weighting system will enter into the pic- 
ture, and a rationale for choosing a particular weighting system 
should be available at that time. Simply “trying” each of the types 
to see which correlates best with a variable of interest cannot 
lead to any useful generalizations. 

Of the five types, the fifth, residual gain, has a rationale that 
is rather interesting. The principle involved is the familiar one 
of separating a set of data into two uncorrelated parts (in the 
sample). It can readily be verified that in the univariate case, 
where one is concerned only with a single pair of pre- and post- 
measures, an analysis of variance of residual gains for two or 
more groups of subjects can be equated with the analysis of co- 
variance model. 

It also is interesting to observe the connection here with Guttman’s 
(1953) notion of image analysis. If we consider only a single pair of 
pre- and post-measures and expand the type 5 measure of gain times 
its transpose, we secure three terms: N, N1jj, and —2Nr jj, where 
rjj designates the correlation across the two occasions. The variance 
of this measure of gain (using N as a divisor) is, of course, 1 — 7 jj. 
Had we conceived of the pair of pre- and post-measures as two 
unit vectors with a common origin, but separated by an angle whose 
cosine is rjj, then projecting one vector onto the other would give a 
projected vector of variance equal to 7°jj. This projected vector is = 
Guttman image of one variable in the space of another. The anti- 
image, which can be conceived as the third side of the right-angled 
triangle, the hypotenus of which is the vector being projected, has 
а variance equal to 1 — 1°jj, or the variance of this measure of gain, 
Apparently, then, these type 5 measures of gain constitute a highly 
Specialized set of anti-images. 


Intercorrelations of Gains 


Next let us look at the intercorrelations of measures of gain and 
note some of the problems that may arise. We need meo trest types 
1 and 3, since their intercorrelations will be identical with those 
for type 2. We already have suggested that four elements, which 
May be written as a supermatrix as in (1) enter into the Мр 
of a measure of gain. Similarly, four elements enter into the 


Intercorre]lations. 


310 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Consider first à type 2 measure of gain. The relevant super- 
matrix is that of (1), which might be symbolized: 


С» Ca 
Ci Cu 


Then the matrix of intercorrelations of such gains would be given 
by the normalized sum of matrices: (Cz + С. — C — Cn). For 
any two pairs of variables, j and k, the correlation between gains 
has as a numerator (с;з + Спы — Com — Ci) where the lower 
case c's designate the sum of cross-products of deviation scores, 
and as a denominator the product of the square roots of the variances 
of the gains, which are respectively (cj; + сал — 2с), and 
(Ciara + сыы — 2cm). The interpretation of such a correlation 
coefficient appears to be far from clear since it is in part a function 
of the relative variances (range of talent) for both jand k on each 
of the two occasions. In a moment we will comment on the factoring 
of matrices of intercorrelations of gains. When we do, we will point 
out that such factors (which will not be recommended) necessarily 
differ for type 2 (or 3) and type 4 measures of gains in a fashion 
that is analogous to the differences between factors of a covariance 
matrix and a correlation matrix, 


For type 4 measures of gains the intercorrelations are a function 
of the elements of the supermatrix: 


MM 


(3) 


(4) 
Ha Ry 


Where R is used to symbolize a correlation matrix. 2; and R» have 


DUNS the diagonals; №, is the set of correlations across the two 


(5) 


Ra, RaD, 
D,Rıa DR, D, 
адаш suppressing the scalar N, То secure intercorrelations of gains 


с ттүү -———————————— "——— m gt EHE S 


CHESTER W. HARRIS 311 


so measured, we normalize (Rə + D,RuD, — Р, — RaD,). 
Note that non-zero elements of the diagonal matrix D, come from 
the main diagonal of the square (but not symmetric) matrix Ry 
(or Ra). We now have as the variance of gains, an expression of the 
form 1 — 7^5, or the variance of an anti-image. The covariances of 
the gains are of the form (rja + istis iii — "еы? — Топтан). 
We have now documented the point that, like correlations with an 
outside variable, the intercorrelations of gains shift in a non-predict- 
able manner as we shift the method of measuring gains. 

Up to this point we have been concerned with demonstrating a 
lack of invariance of results over three types of measure of gain. 
It clearly is possible for the same basie data to yield somewhat 
different. conclusions, both in terms of the magnitude of change 
as well as in terms of the correlates of change, when these three 
types of measure of change are employed. This is, of course, a 
rather bad situation. What we have done so far is to develop a 
notation that would permit a precise description of the measure 
of change being used and would also enable the investigator to 
reveal the elements entering into his obtained correlations of the 
change variables with themselves and with outside measures. 
Adopting and using such a notation would in some cases resolve 
controversies that rest on failure to communicate adequately, 
such as failing to distinguish standardized crude gains from dif- 
ferences in standard scores. Nevertheless, the situation still is bad. 
Let us try to remedy it by examining one or more modes of anal- 
ysis that may have certain features that are invariant over these 


three types of measure of gain. 


Factor Analysis of Gains ү 
r invariance with the old stand-by 
factor analysis, we can turn up some findings that will do two 
things: (a) indicate that factoring the jntercorrelations of 3 
scores is probably a poor policy, and (b) suggest that we x 1 
to the original variables, rather than the gains, for an analys! 
that will hav desirable properties. j { 

Let us "e indi briefly that factoring а matrix of үү S 
relations of gains can be expressed as analogous operations on ; s 
supermatrix of four elements which determine the intercorrela 
tions of gains. Let us take, arbitrarily, a type 2 measure, HABE 
matrix for which is given in (3). To secure the intercorrelations 0 


If we begin the search fo 


312 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


gains, with units in the diagonal, we normalize the sum of ma- 
trices (C22 + Cu — Cy — Сз). The matrix we wish to factor, 
then, is either this normalized sum of matrices (which implies a 
unit-varianee type of factor analysis) or this same normalized 
sum of matrices minus a diagonal matrix of unique variances 
(which implies a communality-type factor analysis). We will not 
stop to discuss the controversies that Tage over these two points 
of view in factor analysis, but will arbitrarily choose the first type. 

Having computed the diagonal matrix, D, that normalizes the 
sum of matrices, we could then write a new supermatrix like this: 


000 -DCD 
-DCD DC,,D 


Note that we have minus signs prefixing two of the matrices, and 
that the matrix multiplication 


IZ Zil par] | 


(6) 


I 
I (7) 
where M symbolizes the Supermatrix written just above, yields 
the normalized sum of matrices, or the matrix of intercorrelations 
of gains. Now factors are, to state it very simply, linear com- 
posites of the elements of a symmetric matrix. The standard meth- 
ods of factoring all Proceed by choosing weights to apply to the 
ly such a set of weights to our form 
ent to applying this same set of weights 
atrix (6) and then consolidating the re- 
: ; before standardization, a first centroid 
sum for gain variable j is simply the sum of the analogous first 


Although this type of Correspondence exists, the shift from (6) 
to (7) has a potential defect, since it j t 

is substantially less than 
ists that an elegant set o 
the gains, is inadequate 
from which (7) was со 
gains, however, remains 
of measures, Had we used a 


our factoring principle the Same, we still would get different fac- 


| 
| 


CHESTER W. HARRIS 313 


tors of gains for the three measures. Since we have established a 
correspondence between taking linear composites of intercorrela- 
tions of gains and taking similar composites of the four matric 
elements, it seems reasonable to turn our attention to the super- 
matrices themselves, which contain all the information about re- 
lationships within and between pre- and post-measures. These are 
much more tractable. 


Factor Analysis of the Supermatrices 


In proposing the factoring of the supermatrices, we will limit 
the methodology to that of Rao’s (1955) canonical factors. This 
is à communality-type analysis, and as such is objected to by 
some. For those who dislike communality-type analysis, we will 
propose a different method of attack in a subsequent section that 
may please them better. 

Rao's canonical factor analysis postulates a hypothetical set of 
uncorrelated factor scores each of which is correlated with the ob- 
served set of data. He then uses a standard canonical analysis to 
arrive at an equation of the form: 


IR — aU*| = 0 
in which R is the intercorrelation matrix with units in the diagonals 
and U? is the unknown matrix of unique variances. The unique var- 
iances are estimated; then the roots, a, and characteristic vectors 
of the equation are solved for. This constitutes a first stage. New 
estimates of the unique varianees can then be made from the re- 
sults of the first stage, and a second cycle run. This continues un- 
til convergence with a pre-assigned tolerance - apum 

One of the features of Rao's canonical factor analysis is that the 
roots, a, which are related to the canonical correlations between 
hypothetical factor scores and observed data, are invariant ym 
the equation is pre- and post-multiplied by a non-singular ea 
such as a diagonal matrix. As à result, canonical factors of t e gs 
Telation matrix will, when pre-multiplied ips E Ч 
agonal matrix, yield the canonical factors of, say, the T M 
covariance matrix. This relationship holds also for the earle 
Scheme of Lawley. 

What we have, then, is а method o 
the seale of the variables is essentiall 
the analysis of the variables with a given sea 


f factor analysis in which 
y separable, so that from 
Je (such as unit var- 


34 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


iance for example) one can write the analysis that would result 
with any other scale. Here then, is a mode of analysis of our 
three types of measure of gain that will permit predicting the re- 
sults of the analysis of one from the analysis of any other. To see 
this, recall the three supermatrices (3), (4), and (5), and note 
that each transforms into another simply by pre- and post-multi- 
plication by the appropriate diagonal matrix. 

A program for calculation probably would begin with super- 
matrix (4), since machine routines for securing the intercorrela- 
tions of any number of variables are generally available and are 
well understood. Further, such a routine would ordinarily make 
available the variances of each of the pre- and post-measures, 
which would be used if one eventually wishes to transform the ob- 
tained canonical factors of (4) into those of (3). Further, the rou- 
tine would necessarily determine the elements of D,, since these 
are simply the elements of the main diagonal of Rs, ; these weights 
would be employed in transforming the canonical factors of (4) 
into those of (5), if they are desired, 

Two Schemes for making the initial estimate of U?, the unique 
variances, are to be preferred in this situation. The first would be 
the use of one minus the Squared multiple correlation of each var- 
iable with all the others as the uniqueness estimate for the variable. 
Invering the supermatrix (4) gives the reciprocals of these unique- 
me аве in the main diagonal of the inverse of the super- 
matrix, This is a considerable chore, but if adequate facilities are 
available, it probably should be done. On the basis of other work, 
which has not yet been published, it is possible to state that such 
initial estimates used in the Rao procedure have certain very neat 
properties. For one, when such estimates are used in the first stage 
of the Rao procedure, the number of roots, a, that are larger than 
unity is exactly equal to Guttman’s (1954) “best” lower bound to 
the namber of common factors, For another, at the end of this 
first cycle, it is possible, if one Wishes, to output all the factors of 


the complete image matrix as defined by Guttman (1953), and the 
factors of the anti 


CHESTER W. HARRIS is 


common factors that would be required according to Guttman’s 
criteria if sampling error, in the sense of sampling persons, did not 
exist; but it also permits discarding those that are not statisti- 
cally significant in Rao’s sense. Thus factors of the image matrix, 
common factors that form a Gramian matrix for the set of per- 
sons considered as a population, and canonical factors that satisfy 
a test of significance can all be secured from the routine. 

A second method of estimating uniqueness that probably is rather 
satisfactory in this situation is to use one minus the square of the 
correlation across the two occasions. This would give the same 
uniqueness estimate for, variable j on occasion 1 and on occasion 
2. When the correlation across occasions is substantial, such esti- 
mates should be fairly close to those obtained by inverting the 
supermatrix, since the correlation across occasions must be a lower 
bound to the multiple correlation. If these estimates are used ini- 
tially, one cannot of course secure the factors of the image or anti- 
image matrices precisely. 

Such a package program gives at least three sets of factors that 
may be worth study in any particular ease. One would be the fac- 
tors of the image matrix, A second would be the complete set of 
common factors, a la Guttman. A third would be that portion of 
these common factors that are significant by Rao’s test. All three 
sets could be transformed by pre-multiplication by the appropriate 
diagonal matrix into the comparable factors for a different method 
of measuring change; surprisingly, this is true for all three sets of 
factors, though one would have to recognize that a supermatrix like 
(5) implies that not all the vectors are taken at unit length. Whether 
or not such factors, in their initially contained form, are readily i 
terpretable is another question. No doubt rotations would aid in 


interpretation in many cases; for this, analytic techniques prob- 


ably are preferable. 


Canonical Analysis of the Supermatrices 
factor analysis would 


Those who object to a communality-type 
{ the supermatrices. If 


be better pleased by a canonical analysis 0. pame 
we have a supermatrix like (4), the standard equation 18: 


IR Ras 3 —bR| =0 
or 
[Ra Rss Ris — bR] =0 


316 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


both of which yield the same roots, b. Ordinarily we think of ca- 
nonical analysis as a method of extracting a largest root, b, to- 
gether with the weights on the variables that yield two composites 
maximumly correlated. However, there is no necessity to stop with 
the largest root. Instead one can extract all the non-zero roots and 
thus develop all the composites that are positively correlated. These 
account for all similarities over the two occasions; variance specific 
to either occasion may, of course, remain. It too can be examined, 
though for this purpose a modification of Tucker’s (1958) inter- 
battery method of factor analysis may be preferable. 

Like canonical factor analysis, a canonical analysis of the su- 
permatrices is not specific to the scale of the variables. Take, for 
example, supermatrix (5) and work up the equation: 


|D,Ry RR D, — bD,R,,D,| = 0 


to see that we have simply transformed one of the equations for 
Supermatrix (4) by pre- and post-multiplying by a diagonal ma- 
trix, D,. Such a transformation leaves the roots, b, invariant, and 
merely alters the weights in a predictable fashion. Thus, again, we 
can use supermatrix (4) for the analysis and transform the results 
if we wish the analysis for a type 3 measure of change or a type 
5 measure of change. Since the roots are identical, the measure 
of relationship across Occasions is invariant for all three measures 
of change. Like canonical factor analysis, one can if he wishes 
solve for all the non-zero roots, which treats the particular set 
of persons as a population, or one can apply a test of signifi- 


cance to determine which telationships between the occasions are 
non-chance, 


Interbattery Factors 


third attack on the Supermatrices might be along the lines 


A 
of Tucker’s (1958) interbatte 


" CHESTER W. HARRIS 317 


ply factor techniques to these residuals to secure factors specific 
to the two occasions. 


The factors common to the two occasions may, of course, be ro- 
tated. Tucker suggests rotating them separately, but this would 
seem to be a poor poliey for the type of problem we are consider- 
ing. Tucker begins the development of his method by, in effect, re- 
quiring that the factor scores be identical for the factors that are 
eommon to the two systems (occasions). He shows that this re- 
quirement leads to the analysis of Rə, or the cross-correlations. 
It is possible to add the converse, which Tucker did not prove; 
namely, that the analysis of Rs, necessarily defines the factor 
scores as identical for the factors that are common to the two oc- 
easions. In their original form, then, the factors common to the 
two sets of variables are associated with a single set of factor 
scores describing the persons. Separate rotation necessarily alters 
the situation to one in which the factors for occasion 1, say, are 
associated with factor scores that are different from the factor 
scores associated with the factors for occasion 2. This is true for 
orthogonal as well as oblique rotations. 

The problem suggested here is one of holding something constant. 
We can readily hold the factor scores constant over the two occa- 
sions by rotating both sets of factors simultaneously and n the 
Same fashion. This then permits us to examine how the variables 
“change” from one occasion to the other when we assume that the 
persons have not “changed.” This may do some violence to your 
thinking, However, the position is a reasonable one if we grant 
that the “same” measurement operation may simply weight the 
individual's relative standing differently on two different occasions. 
An alternative is to hold the factors constant, reflecting the "game 
measurements, and allow the factor scores for the two CANOE to 
differ. This seems to be less easy to accomplish. The third alterna- 
live is to change both factors and factor scores, which results from 
Separate rotation; it appears to have no virtues. — $ 

A major difficulty with the interbattery approach is Lp 
not appear to be invariant in any sense over the three Without © 


Measuring change. 
Conclusion 


ve methods of measuring change, have re- 


We have examined fi 


318 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


duced them to three different types, in terms of their relationships 
with other variables, and have looked at four methods of analysis 
of such change data. One is the factoring of the interrelationships 
of the change measures themselves, which is not invariant over the 
three different types of measure. The others are canonical factor 
analysis of the supermatrix, canonical analysis of the supermatrix, 
and an interbattery method of analysis. Of these, the first two 
have a certain invariance over the methods of measuring change, 
but the third does not. These last three methods offer a means of 
describing, in terms of constructs or hypothetical or derived var- 
iables, the interrelations among any set of pre- and post-measures. 
They differ in several respects. Canonical factor analysis defines 
the constructs or hypothetical variables as projections from the 
space of the original variables into a common factor space. Thus 
the factor scores associated with such factors cannot be calculated 
precisely, but must be estimated by some principle. This is a gen- 
eral characteristic of all communality-type solutions of any ma- 
trix. Whether or not it is a desirable characteristic depends upon 
the point of view. The argument, represented roughly by the point 
of view of Truman Kelley versus that of Thurstone, still continues. 

Associated with the canonical factors, if one follows the com- 
puting program outlined above, is the set of factors of the image 
matrix and of the anti-image matrix, For the problem of describ- 
ing change over the two occasions, these may have particular utility. 
The Image matrix appears to deseribe the similarities over the two 
Occasions In terms of faetors that are located in the space of the 
variables, rather than as projections into a common factor space. 
The factors of the anti-image matrix can be interpreted as con- 
structs describing specificity of the variables, when specificity is 
defined ma particular fashion, A problem which we have not yet 
explored is that of whether there is any rational and easily derived 
uld relate the factors of the anti-image ma- 
fic to the two occasions derived by a canon- 
ermatrices (in distinction to canonical fac- 


tor analysis) or by the interbattery approach, Someone may be 
interested in exploring this, 


The canonical analysis, like 
variance over the measures of 
too, has a reasonable test of s 


canonical factor analysis, has an in- 
change which is quite appealing. It, 
ignificance associated with it which 


CHESTER W. HARRIS 319 


kes it desirable when one is concerned about sampling error as- 
ted with the sampling of subjects. The results of such an 
nalysis can be put together in a package of factors all of which 
are located in the variable space, and thus factor scores can be 
computed rather than estimated. Such factors can, of course, be 
rotated to facilitate interpretation if it is desired. 

The interbattery approach, modified to fit this situation, yields 
etors common to the two occasions and factors specific to each 
the two occasions. An approximate test of significance is avail- 
able. The rotation problem in this case raises the interesting ques- 
tion of whether or not we wish to hold the factor scores constant 
over the two occasions for the factors common to the two oceasions. 
he major defect of this approach is that we get somewhat differ- 
ent results for the three different methods of measuring change. 
_ Another attack on the problem, which we have not touched on 
here, is the analysis of the data for the two occasions into the con- 
ntional “between” and “within” covariance matrices. This has 
“some interesting relations to the other methods but will have to 
Wait for explication. 


REFERENCES 


DuBois, Philip H. Multivariate Correlation Analysis. New York: 
—. Harper and Brothers, 1957. д 
"Greenhouse, Samuel W. and Geisser, Seymour. “On Methods in the 
Analysis of Profile Data.” Psychometrika, XXIV (1959), 
95-112. ШҮ, 
Guttman, Louis. “Image Theory for the Structure of Quantitative 
Variates.” Psychometrika, XVIII (1953), 277-296. 
ttman, Louis. “Some Necessary Conditions for Common-Factor 
Analysis.” Psychometrika, XIX (1954), 149-162. 
d, Frederic M. “The Measurement of Growth.” EDUCATIONAL 
_ AND PSYCHOLOGICAL MEASUREMENT, XVI (1956), 421—437. f 
Lord, Frederic M. “Further Problems in the Measurement 0! 
— Growth.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 
XVI —451. ' 
MeNemar, pee “© Growth Measurement." EDUCATIONAL AND 
-PSYCHOLOGICAL MEASUREMENT, XVIII (1958), 41-55. i 
nao, С. В. “Estimation and Via Parra in Faetor Anal- 
sis.” Psychometrika, XX (1955), дЫ 
1 бег, теа “An Inter-Battery Method of Factor Analysis. 
Psychometrika, XXIII (1958), 111-136. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vou, XXII, No. 2, 1962 


EM SAMPLING AS A SUFFICIENT BUT UNNECESSARY 
REQUIREMENT FOR PRECISE MENTAL TESTING 


HAROLD WEBSTER 


enter for the Study of Higher Education 
University of California, Berkeley 


IN a previous paper (Webster, 1960) the writer derived a gen- 
eralization of the Kuder-Richardson reliability formula 21. This 
note is intended to clarify the meaning of “item sampling" as dis- 
cussed in conjunction with this generalization. { 

Is it in fact items themselves that are sampled in the case of 
mental testing? It is not. For any actual testing situation it is a 
set of numbers that are sampled, numbers which are associated 
With responses of any given subject to the items. The fact that item 
responses, as opposed to items themselves, comprise the random 
samples with which we must work enables us to improve slightly 
‘the analogy discussed in the previous paper between physical and 
Psychological measurement. A closer look at the distinction be- 
| tween items and item responses will improve our understanding of 
mental measurement. 

Each item used in a mental test can be regarded simply as a part 
of a measurement operation. One might measure a subject’s height 
à number of times, instead of just once, in order to increase the pre- 
Cision of the final height estimate. So it would be in the case of a 
Mental characteristic such as an attitude; we would use his re- 
Sponses to k attitude items, instead of just one, in order to estimate 
_ More precisely his “true” attitude. | 
In the case of the height measurement trials, a succession of 
Tulers and tapes which need not be random samples might be em- 
Ployed to obtain a random sample of height values. Similarly, a 
Succession of attitude items, although not themselves a random 


321 


322 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


sample, will provide a set of numbers which may be regarded as 
one of a very large number of sets that could have been obtained 
for the subject. Item sampling, as opposed to item response sam- 
pling, is therefore a sufficient but unnecessary condition for achiey- 
ing precise mental measurement. 

It is known, of course, that increasing the uniformity of a set of 
measurement operations, by standardizing the measurement in- 
struments used and the procedures by which they are applied, will | 
reduce error variance. (Here we ignore problems of bias.) Ob- 
viously we cannot administer “the same” test item k times to our 
subject, in a test k items in length, if only because of his time- 
dependent processes, which also make it impossible, regardless of 
the wording of the items, for constant experimental conditions to 
hold for him throughout the trials (the item administrations). But 
we can administer № “similar” items, as defined below, specifying 
that we may think of the trials as experimentally independent only 
if we ignore these temporal effects, or memory, practice, fatigue, 
learning, and the like. 

The primary purpose of measurement in science is to distinguish 
objects, each from others, in a systematic and efficient way. (We 
disregard another purpose, which is to estimate a natural constant, 
say, the velocity of light, for its own interest.) In mental measure- 
ment we wish to increase the measurement variance between per- 
sons, while maintaining a relatively small error variance. The re- 
liability coefficient, a measure of the degree to which we succeed 
in doing this, is therefore an important statistic. 

There are two item similarity restrictions imposed in most men- 
M testing experiments, and they are no different logically from re- | 
strictions also employed in physical measurement. These two con- 


ditions will be referred to as uniform item scoring and valid item 
content. 


It is obvious that, in an atte 


1 i mpt to distinguish persons accord- 
ing to their heights, the reliability coefficient would approach unity 
only if tapes, rules, or other “similar” linear devices with common 
units had been applied in the successive measurement trials; it i5 
equally clear that the reliability measure would vanish, or become 
negative, either if the linear devices used did not possess common 


units, or if highly “dissimilar” instruments had been employed in 
the successive operations. 


| 


HAROLD WEBSTER 223 


Now there are no common units among mental test, items, in the 
sense that equal intervals might be marked on both tapes and rules; 
but we can approximate a measurement experiment in which com- 
mon units are used, and we can show that the extent to which such 
an approximation fails will be reflected by a decreased value for 
the measure of reliability. First, we can decide to use for our k 
trials only items each of which is scored +1 or 0; or, say, only 
items scored 1, 2, 3, 4 or 5. If our philosophy of measurement con- 
tends that in successive measurement trials we hope to obtain 
values that vary little in magnitude, then this is equivalent to 
desiring a small error variance for the trials. (For example, in us- 
ing a test with items scored 1-5, we would expect a person scoring 
4 on a few items also to score 4 on many other items.) Conse- 
quently, we want the mean, over subjects, of their variances across 
the k item scores to be small; but as was shown in the previous 
article, with increasing k this mean is a constant (1/k) times the 
estimated test error variance, which (when expanded) contains a 
term that is k times the variance of the item means. It is there- 
lore obvious that even though increasing the variation in item 
Means may increase the variance between persons, it nevertheless 
always contributes to the measurement error of the test scores; 
and, consequently, it also contributes to some decrease, either in 
Kuder and Richardson’s reliability formula 21, or in the generaliza- 
tion of it. Is a test ever improved, then, by increasing the hetero- 
geneity of the item means? Possibly such heterogeneity should be 
increased for certain kinds of aptitude or achievement Vella, but 
there is no reason to suppose that it is useful generally in mental 
testing. In fact, it is easy to show that heterogeneous item means 
Constitute a definite handicap, in terms of the precision of meas- 
urement, if they are tolerated in personality scales. 4 i 

In addition to uniform item scoring, the second kind of item 
similarity condition, valid item content, also has its analogue in 
physical measurement. In writing test items we attempt to make 
them appropriate, in terms of some theory, for the characteristic 
We intend to measure. Given the fact that the items will be tried 
Out empirically, however, this second restriction is no more neces- 
tary logically than is uniform item scoring. In fact, both item 
similarity conditions are imposed merely because it has been found, 
"xperimentally, that without them the values obtained by means 


324 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of the measurement operations will not order persons effectively 
(distinguish them with precision over a single dimension). 

If measurement operations cannot be found that will discrim- 
inate objects, then the characteristic which the investigator wishes 
to study remains undefined scientifically. For example, use of our 
tapes and rules will produce quite similar, or homogeneous, height 
measurement values by comparison, say, with tapes, thermometers, 
balances, or other heterogeneous devices, which might also be em- 
ployed (assuming abysmal scientific ignorance) in an attempt to 
estimate a subject’s height. The important point is that a criterion 
of similarity, for either physical or mental measurement trials, can 
be used to assess the suecess of the measurement experiment. In 
mental measurement this criterion is known as the reliability 
coefficient. 

Formula 5a, the generalized Kuder-Richardson measure of the 


previous article is such a criterion. The only change now helpful | 


in interpreting 5a is that, aside from problems of bias, it is irrele- | 
vant whether the items themselves, as distinct from the item re- - 


sponses, are regarded as random samples. 


REFERENCES 


Lord, Frederic M. “Statistical Inferences About True S ” Psy- 
chometrika, XXIV (1959), 1-17. ces About True Scores 


Webster, H. “A Generalization of Kuder-Richardson Reliability 


Formula 21.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, | 


XX (1960), 131-138, 


1 1 ses 
Lord's (1060) Bond that if item scoring restrictions are relaxed, then 


, for linear prediction of true scores, has the same 
in the present article. The two formulas 


order to represent the sampling Men 
enn | 7 or type-2, sampling assumptions. Also, Lo! 
method of estimation differs from that discussed above. ah does not depend 
upon analysis of variance, or of variance components. f 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vou, XXII, No. 2, 1962 


THE ANXIETY DIFFERENTIAL: 
INITIAL STEPS IN THE DEVELOPMENT 
OF A MEASURE OF SITUATIONAL ANXIETY? 2 


SHELDON ALEXANDER? 


Southern Illinois University 
AND 


THEODORE R. HUSEK 


University of California, Los Angeles 


Аххтктү has long played a key role in psychological theory and 
Tesearch. However, currently available measures of anxiety often 
Provide confusing or contradictory results when used by different 
experimenters. In addition, attempts to determine whether or not 
the various measures are related to one another have led to rather 
disappointing results (Dibner, 1958; Goodstein, 1954; Mandler, 
Lindzey & Crouch, 1957; Martin, 1958; Raphelson, 1957). 

The need for adequate measures of anxiety has been emphasized 
in a number of recent reviews of this area of research (Blake & 
Mouton, 1959; Jenkins & Lykken, 1957; Jensen, 1958). Blake and 


E—— "n 0 
! This investigation was carried out while the authors were at the University 
of Illinois, and ки partially supported by a research grant (3M-9067) pd the 
National Institute of Mental Health to Charles E. Osgood and Jum Le Vs 
ally, Institute of Communications Research, University of Illinois. Charl ar 
00d participated with the authors in the planning stages of the а M 
Ported here. We would like to express our deep appreciation to Dr. rye 
is many thoughtful comments, and for his wholehearted support o aoe 
Search. We would also like to express our gratitude to Emory L. Cowen, my 
‘td Maclay, Jum С. Nunnally, and Edward Ware for their qnas а өн 
lier version of the manuscript. We would like to thank John Kittross for helping 
Carry out Study II. : 
* Both authors shared equal responsibility for the planni 
tarch, and the aration of this report. 
Rew їз research pans م‎ out while Alexander was a USPHS Post-Doctoral 
а 


rch Fellow at the University of Illinois. 
325 


ng and conduct of the 


326 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Mouton (1959, p. 212) conclude their review by stating: 


Critieal problems must be solved before a satisfactory labora- 
tory-based theory of anxiety and its effect on adjustment is de- 
veloped. . . . One [critical problem] is with respect to the con- 
struction of one or more valid anxiety measures. Without an 
adequate instrument interpretation of results from experimental 
work is rendered extremely difficult. 


The present report describes an attempt to develop an adequate 
verbal response measure of situational anxiety. To maximize its 
research utility, it was felt that such a measuring instrument should 
meet certain criteria in addition to those usually’ considered in dis- 
cussions of reliability and validity. The instrument should: (a) be 
of such a nature that subjects are unlikely to falsify their responses, 
(b) not be susceptible to response sets, (c) be scorable by an ob- 
jective, nonjudgmental key, (d) be easily administered to groups 
of any size, (e) be short, and (f) be inexpensive. The instrument 
described in this report seemed to possess the potential for satis- 
fying these criteria. 


Rationale for the Proposed Measuring Instrument 


Tt was our basic assumption that the person who is anxious for 
a short period is in a different state and perecives things differently 
from when he is not anxious, Among the changes produced by anx- 
lety states are changes in cognition, that is, changes in the meanings 
of various events, persons, objects, and ideas. Such changes could 
be used as indicators of anxiety if: (a) there were a consistent set 
of changes over most individuals, and (b) these changes could be 
measured. Changes which subjects did not realize had occurred 
would be especially desirable. 

The semantic differential (Osgood, Suci & Tannenbaum, 1957) 
seemed well suited as a technique for measuring the cognitive 
changes which presumably accompany the anxious state. The 
semantic differential is especially sensitive to those aspects of 
cognition which Osgood has labeled “connotative” (“the affective 
feeling tones’ of meaning”), In addition, the semantic differential 
possesses the practical advantages of speed and ease of administra- 
tion and scoring. 

In adapting the semantic differential technique for the measure- 
ment, of anxiety, an approach was utilized which differed from the 


ALEXANDER AND HUSEK 227 


stomary manner in which this technique has been applied. In 
conventional use of the semantic differential the scales paired with 
concept are obviously relevant to that concept; e.g, PRESI- 
ENT EISENHOWER: effective-ineffective. However, for the 
surement of anxiety a large number of novel pairings of scales 
concepts were introduced (e.g, DREAMS: loose-tight; 
S: deep-shallow, ete.). It was hoped that these novel com- 
tions of scales and concepts would be difficult to fake; that is, 
it would be difficult for subjects to discern what was a “good” or 
'desirable" response. 
- The first step in the attempt to develop an Anxiety Differential 
AD) was to find a preliminary set of concepts and scales which 
ht differentiate between anxious and non-anxious states. On 
basis of the results of a pilot study, and hypotheses concerning 
the nature of anxiety, 68 items were constructed. The next step 
vas to determine experimentally which items would be sensitive 


STUDY I 
Ч Experimental Procedure 
Experimental Stimuli 
It was decided that a strong anxiety-inducing stimulus was 
led to obtain items which were likely to be anxiety indicators. 
stimulus finally chosen was a color film of a surgical opera- 
n on the frontal sinus of a human patient. This twelve-minute 
її was selected as the most anxiety-provoking, after extensive 
ing of films by the staff of the Institute of Communications 
arch of the University of Illinois. 

A quiet travelog about Nova Scotia was obtained for the control 
group. A third film was chosen to serve as an emotional catharsis 
or the experimental subjects. It was a film of football highlights 
mth Many action scenes. 

Subjects 
The subjects were 247 paid, male volunteers, predominantly col- 
ge freshmen. They were all members of the ROTC (required of 


3 i Illinois 
The i i n ike to thank Mrs. Maria Ikenberg of the Ш 
ye and Ear. cadi L4. this film, and for her aid in obtaining a 


ber of other films of this type. 


328 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


all male undergraduates). They volunteered for an experiment in 
which they would “be shown some films" but were not told what 
kind of films. One hundred and nineteen subjects were randomly 
assigned to the main experimental group, 81 were assigned to the 
control group, and 47 were assigned to a “fake” group. 


Testing Procedure 


All three groups of subjects were tested in one evening, each in 
& separate room. The 119 experimental anxiety subjects were told 
that they would see two movies and were then asked to fill out the 
68-item semantic differential booklet.5 Each item was presented in 
the following manner: 


DREAMS 


loose__: : tight 


The instructions given to the subjects were almost identical to 
the “typical instructions” described by Osgood, Suci, and Tannen- 
baum (1957, pp. 82-84). 

After filling out the booklet, the subjects were told about the 
surgical operation film, what it would cover and the fact that it 
was in color. Subjects were told that they were free to leave the 
Toom at any time and would receive their money even if they left. 
Several subjects did leave during the showing of the movie, but all 
except two subjects were able to fill out the post-test forms in the 
hallway outside and are included in the experimental group sample. 
After viewing the operation film, the subjects were reminded that 
they would see another (unspecified) film and were then asked to 
fill out several forms. These forms included a retesting of the 68 
semantic differential items, and the Nowlis-Green adjective check- 
list designed to measure momentary moods (Green & Nowlis, 1957; 
Nowlis & Green, 1959). The subjects were then shown the tension- 
releasing football film. The control group of 81 men saw the travelog 


* Two random orders of items were used in imi d 
8 д an attempt to eliminate order ап 

ЖЕЗГЕ ‚еш: Each item was typed on an index card, and the cards were com- 
Be 4 uffled to arrive at the first random order of items, The cards were then 
completely reshuffled to give the second random order of items. Half of all sub- 
pui hes oe eS dms on the pretest and half received the other or- 
tems. е post-test each subject i ; the 

one he had on the pretest. ject received the order different from 


a 


ALEXANDER AND HUSEK 329 


instead of the operation film, but otherwise the procedure was the 
same. 

A “fake good" group of 47 men viewed the same films as the ex- 
perimental anxiety group but filled out the second group of tests 
under a special set of instructions. They were told, “The films that 
we are showing tonight are fairly disturbing to most people. We 
are interested in finding out how well people can cover up the fact 
that they are bothered by the films. Please fill out this booklet as 
if the film had not disturbed you. Pretend you feel fine and that 
you are not at all bothered.” 


Results 

Item Selection 

The first task was to determine which items had been sensitive 
indices of anxiety. Three scores were obtained for each subject for 
each item: pretest score, post-test score, and change score (pretest 
minus post-test). Means and standard deviations were computed 
for each of these scores for each of the three groups of subjects. 
Comparisons were then made between the pretest and the post-test 
responses of the subjects in the anxiety group, and f tests were 
performed on the pretest-post-test differences. Eighteen items 
yielded differences at the .05 level of confidence, 26 items yielded 
differences at the .10 level, and 33 items yielded differences at the 
20 level? Comparing the experimental anxiety and the control 
groups on item change scores (pre-post) disclosed 14 items differ- 
ing at the .05 level of confidence, 21 items at the .10 level, and 27 
items at the .20 level. Comparing the change scores of the “fake 


following information has been depos- 
ited with the American Documentation Institute: (1) means and standard de- 


groups, for each of the 68 items in Study n 
from the factor analysis of the 


Photoduplication Service, Libr 
in advance $2.50 for photocopi 
7 АП confidence levels in the Study ai 


330 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


good” and control groups disclosed four items at the .05 level, seven 
items at the .10 level, and 11 items at the .20 level. 

Comparisons were also made between the treatment conditions 
for the post-test scores only. The anxiety and control groups were 
differentiated by 10 items at the .05 level of confidence, 12 items at 
the .10 level, and 21 items at the 20 level. The “fake good” and 
control groups were differentiated by two items at the .05 level, by 
four items at the .10 level, and by seven items at the .20 level. 

The next step was to try to obtain the optimum combination of 
items that could then serve as an index of anxiety in future re- 
search. A modification of the procedure discussed by Thorndike 
(1949, pp. 247-248) was used. Six preliminary scales involving 
various item combinations were developed in order to meet dif- 
ferent requirements of the test. The first four scales were designed 
for situations where change scores (pre-post differences) would be 
available. The fifth and sixth scales were designed for situations 
where only one testing of the subjects would be possible. 

Scale One consisted of the 14 items which showed both large 
pre-post differences in the anxiety group, and large differences be- 
tween the anxiety and control groups. All 14 items showed pre-post 
differences beyond the .10 level of confidence, and 13 of the 14 
items yielded anxiety-control differences beyond the .10 level (the 
other item was at the 13 level). Scale Two consisted of the 14 
items comprising scale one, plus four additional items which 
showed anxiety-control differences (.05 level), making a total of 
18 items in scale two. Scale Three consisted of the 14 items in scale 
one plus four additional items which showed pre-post differences in 
the anxiety group (.05 level). Scale Four was an attempt at a 
“fake-proof” test and consisted of 10 items which met two criteria. 
The items had to show a difference between the anxiety and the 
control groups (at the .20 level of confidence or better) and also a 
difference between the “fake good” and control groups (beyond the 
.20 level). 

The last two scales were based on post-test responses only. Scale 
Five contained the 10 items Which showed large differences between 
the anxiety and the control groups (all beyond the .05 level of con- 
fidence). To these 10 items were added five additional items to 
form Scale Siz. (These five additional items had p values ranging 
from .06 to .13.) The items making up each of these anxiety scales 


ALEXANDER AND HUSEK 331 


TABLE 1 


The 28 Semantic Differential Items 
Which Compose the Siz Anxiety Scales 


Anxiety Scales 
Item One Two Three Four Five Six 


. ME: frightened-fearless X 

. DREAMS: loose-tight X 

TROUBLE: here-there 

MY MIND: loose-tight 

ME: worried-carefree 

MOVIES: wet-dry 

. LITTLE BOYS: safe-dangerous X 

. ME: jittery-calm 

. HANDS: loose-tight 

10. YESTERDAY: near-far 

11, BREATHING: careful-carefree 

12. FINGERS: loose-tight 

13. ANXIETY: clear-hazy 

14. HANDS: good-bad 

15. HANDS: wet-dry 

16. EYES: large-small 

17. MOVIES: cold-hot 

18. SCREW: strong-weak 

19. GERMS: deep-shallow 

20. ME: helpless-secure 

21. SCREW: nice-awful 

22. THE REAL ME: hard-soft 

23. FINGERS: stiff-relaxed 

24. MOVIES: loose-tight 

25. SCREW: clean-dirty 

26. ME: wet-dry x x x 

27. SCREW: loose-tight x 

28. TODAY: curved-straight 

Ee ho UT nno a o 
Note. For those items on the anxiety scales, the italicized adjective indicates the side of the 

item keyed for anxiety. 


x 
x 


دم مو جر ي و بج مو ص 
"HR‏ 
м‏ 
м‏ 


мм HH 
я HH 


ях 


я PM рарарарарара MEM MH Wh 
м 


ра ра июн ж 


dA 
MM 


beta 
яяя ра FM яя MH яш 
м 
р HE HHH мы 


are indieated in Table 1. It should be noted that there are а large 
number of overlapping items on many of the scales. 

Next, the total score on each one of the six scales was obtained 
for each subject. The scores of the subjects in the anxiety group 
were then correlated with their scores on the anxiety factor of the 
Nowlis-Green adjective checklist. The correlations between the 
adjective checklist measure and the anxiety scales were: scale one, 
52; scale two, .52; scale three, .50; scale four, .48; scale five, .62; 
Scale віх, .63. All of these correlations are significant. beyond the 
001 level of confidence. 

The means and standard deviations for the three groups of sub- 


332 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 
Means and Standard Deviations for the Six Anxiety Scales 


Anxiety Group Control Group Fake Group 
Seale Mean S.D. Mean S.D. Mean S.D. 
ls 8.2 11.0 —0.8 5.6 1.2 6.3 
2* 8.8 13.9 —2.9 6.7 0.7 6.4 
3* 9.9 11.5 1.0 6.7 2.8 7.2 
4* 1.9 6.8 —2.5 5.2 1:7 4.4 
5^ 41.9 11.4 48.7 7.6 48.8 5.4 
6 61.5 14.0 69.4 9.0 70.0 7.3 
ee eee 


* Scores based on pretest-post-test differences. The higher the score, the greater the anxiety 
К va Doors based on the post-test. The lower the score, the higher the anxiety. 
jects were obtained for each of the six anxiety scales. These are 
presented in Table 2. 

Since the scales were developed from the present data, it was not 
deemed legitimate to test the statistical significance of the differ- 
ences between the three groups. Such comparisons would be capi- 
talizing on chance factors. However, for the reader who is nonethe- 
less interested in seeing what the t-values look like, we present 
them in Table 3. They will give some indication of the upper 


TABLE 3 
t-Test Comparisons between the Various Groups of Subjects 


Anxiety-Control Fake-Control Anxiety-Fake 
Scale t t t 

ee NNNM 

: 7.5 1.8 5.1 

5 7.9 3.0 5.0 

3 6.8 1.4 4.7 

H 5.2 4.9 0.2 

6 5.0 0.1 5.2 

4.8 0.4 5.0 
As 0 


bound for significance with the Anxiety Differential. Because of 


the capitalization on chance factors, extreme caution should be 
used in interpreting the values. 


Factor Analysis 


A centroid factor analysis was performed for the post-test Anx- 


ALEXANDER AND HUSEK 333 


jety Differential item scores for the anxiety group. The results 
of this analysis are relatively clear. The first factor (which was 
by far the largest one to emerge from the factor analysis) is pre- 


‘sented in Table 4. This first factor seems to be definable as an 


TABLE 4 
Items Loading Above .35 on the First Centroid Factor for Study I 
ee 


Item Loading 

ME: frightened-fearless .689 
ME: worried-carefree .640 
ME: jittery-calm .638 
ME: helpless-secure .636 
HANDS: loose-tight —.605 
ME: clear-hazy —.551 
SCREW: nice-awful — .549 
SCREW: clean-dirty —.547 
MY MIND: loose-tight — .529 
BREATHING: careful-carefree .521 
FINGERS: stiff-relaxed .520 
HANDS: good-bad —.511 
TOMORROW: loose-tight —.500 
SCREW: good-bad — .493 
HANDS: wet-dry : .483 
YESTERDAY: clear-hazy —.476 
DREAMS: loose-tight —.463 
FINGERS: loose-tight — .440 
TOMORROW: clear-hazy —.419 
MOVIES: wet-dry .398 
TODAY: clear-hazy —.396 
MOVIES: loose-tight —.392 
.391 


GERMS: deep-shallow 
ME: wet-dry * 
TODAY: near-far — .353 


anxiety factor. The rather obvious anxiety items (ME: frightened- 
fearless, etc.) are joined by a number of other items which are not 
obviously related to anxiety. It is interesting to note that of the 
25 items in Table 4, 18 are to be found on the six anxiety scales 
which were developed on the basis of group differences. The first 
factor accounted for 13.2 per cent of the total variance, which, 
though small, is much more than the variance accounted for by any 
of the other factors. All of the remaining factors were quite small, 
with none containing more than five items loading above .35. None 


8 See footnote 6. 
9 Factor 2 accounted for 59 per cent, factor 3 for 4.1 per cent, etc. 


334 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of the other factors was clearly interpretable. They seemed to be 
“specific” factors formed by those items containing either the same 
seale or the same concept, 

An orthogonal rotation was also performed (using the quartimax 
procedure described by Neuhaus and Wrigley, 1954). The only 
change this produced was to drop a few items from Factor 1 and 
to raise the loadings on Factor 1 of most of the remaining items, 


Discussion of Study I 


In interpreting the results of Study I, we assume that anxiety or 
some very similar emotional condition was actually aroused in most 
subjects who viewed the surgical operation film. While there is no 
way to prove this assumption, there is some evidence which indi- 
cates it is a very reasonable one. In the preliminary screenings of 
the movie by the Institute of Communications Research staff mem- 
bers, the introspective consensus was that anxiety had been aroused. 
The comments of the Subjects after the experimental sessions tended 
to support this. The fact that a number of obvious anxiety-related 
items (e.g., ME: frightened-fearless, etc.) showed significant shifts 
as a result of the experimental treatment also supports the assump- 
tion that the state aroused was some form of anxiety. 

A sufficiently large number of items differentiated between the 
anxiety and control conditions to permit the construction of six 
provisional anxiety scales, developed for use under varying condi- 
tions. While this is encouraging, it does not answer the more im- 
portant question of whether or not the scales are valid indices of 
anxiety. 


When there is no entirely adequate criterion available for de- 
fining the quality or attribute being measured, then construct 


ALEXANDER AND HUSEK 335 


ts); b) The positive correlations with the anxiety factor of 
Nowlis-Green adjective checklist lend some support to the 
iety scales. While the adjective checklist measure is not “fake- 
proof,” it has been used as a measure of momentary anxiety states, 
and the positive correlations suggest that the Anxiety Differential 
also taps momentary anxiety states. 

While the results of Study I were promising, they were, of course, 
not conclusive. A number of questions relative to cross-validation 
‘and validity generalization merited further attention. 
Cross-validation with another sample of subjects. Obviously the 
“next immediate step involved the cross-validation of items and 
seales on a different sample of subjects. 

Cross-validation under other stimulus conditions. It is important 
that stimulus-specific items (those sensitive only to the one stimu- 
lus used) be weeded out. If the AD is to have general applicability, 
it is essential that any possible specific effects of the particular film 
used be ruled out. 

Sex differences. The present results were obtained with an all-male 
sample. A number of studies have indicated that sex differences 
“may lead to differential responses in situations where anxiety is in- 
volved (Cowen & Obrist, 1958; Davis & Buchwald, 1957; Farber 
& Spence, 1956; Heilizer, Axelrod & Cowen, 1956; Moffit & Stagner, 
1956; Postman, Bronson & Gropper, 1953). It is important to ex- 
amine whether men and women respond differently to the Anxiety 
Differential. 

Anonymous vs. personal testing. The subjects were tested in large 
_ groups and were anonymous (since names were not required on the 
test booklets). It is possible that in less anonymous testing situa- 
tions (i.e. similar to the kind in which much anxiety research is 
actually performed) subjects would be more inclined to distort their 
_ responses—resulting in a less sensitive measuring instrument, 

A second study was undertaken to examine these problems em- 


pirically. 
STUDY II 


Method 


Subjects 
The subjects were 152 paid undergraduate students who were 
contacted in their regular classes and who volunteered to participate 


336 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in a study “where you will be shown some filmstrips and will fill 
out some questionnaires." The experimental treatment was received 
by 41 males and 59 females; 26 males and 26 females served as 
controls. 


Stimulus Material 


The stimulus, chosen to arouse bodily harm anxiety, for the sec- 
ond experiment was a color filmstrip depicting the results of high- 
way accidents.!? To enhance the anxiety-inducing qualities of the 
film, a brief commentary was recorded for each of the 30 frames 
that were shown to the subjects. The presentation of the filmstrip 
and the recording lasted about 1114 minutes. A humorous filmstrip 
of baby pictures was obtained for a control group, and also served 
as a catharsis after the accident film for the experimental group. 


Testing Procedure 


The subjects were tested individually. Each first responded to a 
37-item Anxiety Differential booklet (which took about five min- 
utes) containing all the items on the six anxiety scales developed in 
the first experiment, plus nine additional items which it was hy- 
pothesized might relate to anxiety, 

Each subject was then told that the first of several filmstrips 
would be shown, and that “This filmstrip has some pictures of acci- 
dents in it. If the film bothers you badly, feel perfectly free to ask 
me to stop the film at any time," The subjects viewed the accident 
filmstrip, and afterwards completed the “after” form of the test 
materials, Next they saw the baby pictures. Lastly they answered 
a number of questions—including informational items about the 
filmstrips, demographic questions, and open-ended questions con- 
cerning their feelings about the filmstrips. 

The palmar sweating index (Silverman & Powell, 1944) was also 
obtained for the first 50 experimental subjects before and after the 
accident filmstrip.1 Correlations were obtained between the six 


1° This filmstrip, Death on tho Highway, is distributed by the Suicide Club of 
Berkley, Michigan, Ё 

ng 
sweatii 
index of emotional tension. An exami ti ionship between the 
Anxiety Differential and a ph MM tie boves 
deemed desirable. 


ALEXANDER AND HUSEK 337 


scales of the Anxiety Differential and the palmar sweat pretest- 
post-test changes. Ranging from .03 to .11, none of the correlations 
was even close to statistical or practical significance. In addition 
the difference between the pretest and the post-test palmar sweat- 
ing scores was examined. The mean change was —.19, the ё as- 
sociated with this change is .70, suggesting that there was no real 
change. Since, as is noted later in the results section, the majority 
of the subjects reported being disturbed by the filmstrip, it was 
concluded that in this instance palmar sweating was not a sensi- 
tive measure. This conclusion led us to eliminate palmar sweat 
measurement for the remaining subjects. 

The control subjects viewed the baby pictures instead of the ac- 
cident filmstrip and did not see a second film. Otherwise the proce- 
dure was the same as for the experimental subjects. 


Results 
Effectiveness of the Anziety-Inducing Treatment 

There was some evidence that the highway accident filmstrip had 
the desired effect. Of the 100 experimental subjects, 64 volunteered 
comments (on the open-ended questions) indicating that the acci- 
dent filmstrip had been emotionally disturbing (“frightening,” 
“scary,” “nauseating,” “horrible,” “hard to look at,” “terrible,” 
ete.). Nine subjects asked the experimenter to stop the filmstrip; 
they were, however, able to complete all the forms and were in- 
cluded in the sample. 

For the baby-pictures filmstrip, 93 of the 100 experimental sub- 
jects volunteered comments that it had been a very pleasant ex- 
perience. In the control group 38 of the 52 subjects volunteered such 
comments. The difference between the number of comments in the 
experimental group and in the control group was significant beyond 
the .01 level (as tested by chi-square). It was assumed that the 
difference was related to the fact that the experimental group 
viewed the accident film before the baby film. 


Cross-Validation of the Anxiety Scales 


The primary purpose of the present experiment was to cross- 
validate the items and scales developed for male subjects in the 
initial research. An additional purpose was to examine the results 
of women subjects. The means and standard deviations necessary 


338 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 5 
Means and Standard Deviations on the Siz Anxiety Differential Scales 
Used in Study IT 
Males Females 
Experimental ^ Control Experimental ^ Control 

A OR Din е S.D. E ESD: Жз, 1 2). 
ЗЕМЕ т 0005 5 12 AA AS 
Scale Опе 89 9.0 -1.1 4.1 5.5 8.9 —0.5 4.4 
Seale Two* 10.8 11.4 -3.0 5.5 6.2 10.9 0.0 5.7 
Scale Three* 8.9 10.4 -0.6 5.1 5.6 8.9 —0.7 5.0 
Scale Four* 42 6:1 —17 4.9 2.9 5.3 0.3 4.7 
Scale Five’ 38.7 8.6 473 6.6 41.2 9.0 46.1 7.7 
Scale Six> 58.6 9.4 66.4 7.4 59.6 10.0 66.1 8.5 


* Higher score reflects higher anxiety on these scales. 
5 Lower score reflects higher anxiety on these scales. 


to answer these questions for the six anxiety scales are presented 
in Table 5. 

The differences between the anxiety and control groups were ex- 
amined by means of t-tests. The results of the t-tests are presented 
in Table 6. 

All six comparisons for the males yielded probabilities less than 
001. The scales also differentiated significantly for the women. 
However, the mean differences for women were smaller than those 
for men. Consequently, t-tests were obtained comparing the men 
with the women on each of the six anxiety scales for both the ex- 
perimental and control conditions, These are presented in Table 7. 
Although only one of the 12 ¢ values was significant (they ranged 


TABLE 6 


Tests of the Significance of the Differences between 
Experimental and Control Groups For Study II 


Males Females 


yy 


Scale One 6.17 <.001 4.37 <.001 
Scale Two 6.74 2001 3.41 :001 
Scale Three 4.90 <.001 4.14 <.001 
Scale Four 4,33 <.001 2.24 <.05 
Scale Five 4.05 <.001 2.54 .02—.01 
Scale Six 3.82 4.001 3.07 <.01 


* Using two-tailed testa. 


ALEXANDER AND HUSEK 339 


TABLE 7 


Tests of the Significance of the Differences between the Male and 
the Female Groups for the Experimental and the Control Treatments 


Experimental Control 
t р. t P 
| 
р Scale One 1.86 ‚10 —.51 
| Reale Two 2.07 .05 —1.94 .10 
Scale Three 1.66 10 ‚07 
Scale Four 1.12 —1.50 
_ Secale Five —1.43 .60 
Seale Six —.50 14 


from .07 to 2.07.), several were sufficiently large to suggest that 
future research explore the question of sex differences further. 


ы аа صو‎ lle 


Reliability 

Internal consistency estimates (utilizing coefficient alpha; see 
Cronbach, 1951) were obtained for each anxiety scale, both for the 
male experimental group and the female experimental group. The 
estimates are listed in Table 8 and seem to indicate that the anxiety 
keys possess adequate internal consistency reliability. 
| It should be noted in examining the table that scales one through 
` four are based on difference scores and have lower reliabilities than 
scales five and six which are based on post-test scores only (meas- 
ures from a single testing generally being more reliable than change 
scores). 


Cross-Validation of the Individual Items 


Pretest-post-test differences. The results of the cross-validation 
study in terms of pretest-post-test differences for the individual 


TABLE 8 
Internal Consistency Estimates for the Siz Anxiety Scales 
| Male Experimental Group Female Experimental Group 
E. — S UE E E T, 
Scale One .62 .68 
Scale Two .68 .72 
Scale Three .61 .57 
Scale Four .52 .42 
Seale Five .80 .85 
Scale Six Тї .79 
No O 0 eee сыс = 


340 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


items are very encouraging. Of the 37 items used in the experiment, 
16 displayed significant (P < .025, one tail) changes for the men. 
For the women, 15 items showed similar changes. 

A large number of items displayed pretest-post-test changes in 
both Study I and Study II. When the t-test results for the male 
subjects of the initial study and the male and the female subjects 
of the present experiment are examined, 17 items display substantial 
differences (P < .10, one tail) in at least two of the three groups. 
The 17 items and the t's are listed in Table 9.1? For all 17 items the 
direction of the difference is the same for the three groups of ex- 
perimental subjects, 

Ezperimental-control change score comparisons. Using pretest- 
post-test change scores for the males, 15 items displayed large dif- 
ferences between the experimental and control groups (P < .025, 
one tail); for the females, 12 items displayed similar differences. 
When the results of Study I are examined along with the results of 


TABLE 9 


The Best Anxiety Differential Items When 
Only the Pretest-Post-test Anxiety Group Differences are Considered 


Study Study Study 
f п п 
Ttems* Group Males Females 
t t t 
ere ы fF 
ME: frightened-fearless 1.70 3.35 
DREAMS: loose-tight —2.80 о —1.83 
MY MIND: loose-tight —1.31 —2.47 —1.44 
LITTLE BOYS: safe-dangerous —8.21 —4.19 —3.46 
FINGERS: straight-curved —.53 —1.83 —2.53 
ME: jittery-calm 4.10 1.38 2.78 
BREATHING: careful-carefree 4.84 4.83 4.82 
FINGERS: loose-tight —.27 -3.11 —3.20 
HANDS: good-bad —8.78 —2.69 —1.84 
HANDS: wet-dry 3.07 1.19 1.08 
SCREW: strong-weak —3.04 —3.48 —2.82 
ME: helpless-secure 5.64 2.71 3.48 
FINGERS: stiff-relaxed 4.14 3.02 2.25 
MOVIES: loose-tight —2 92 —4.26 —2.60 
ME: wedry _ 4.06 1.80 1.32 
SCREW: loose-tight 43 2.61 2.63 
TODAY: curved-straight 1.80 1.64 1.03 


re omm 


* The anxious side for each item is italicized, 


12 See footnote 6. 


the present study, 17 items yield consistent differences between the 
experimental and control groups. The 17 items and the £'s associated 
with them, for the initial anxiety study and for the males and fe- 
males of the present study, are presented in Table 10. 

Experimental-control post-test score comparisons. Twelve items 
which showed the largest and most consistent post-test score dif- 
ferences between the experimental and control treatments over all 
three groups of subjects were selected. The 12 items are listed in 
Table 11 along with their associated t-values. 

Factor analysis. For the factor analysis the male and female 
experimental groups were combined, providing a total of 100 sub- 
jects. The post-test scores of these subjects for the individual items 
were intercorrelated, and the correlations subjected to a centroid 
factor analysis. The first factor accounted for 16 per cent of the 
total variance of all the items, and had content quite consistent 
with the general conception of anxiety. The first factor was almost 


—R‏ ا ققق 


TABLE 10 
The Best Anxiety Differential Items When Only the 
Experimental-Control Differences for Change Scores 
(Pretest Minus Post-test) are Considered 


ALEXANDER AND HUSEK 341 


Study? Study? Study^ 


I п II 
Group Males Females 
Items* t t t 

| ME: frightened-fearless 2.53 4.26 3.73 
l DREAMS: loose-tight -3.17 —3.90 —2.69 
- MY MIND: loose-tight —2.57 —3.96 —1.02 
LITTLE BOYS: safe-dangerous —2.31 —4.16 —2.75 
ME: jittery-calm 4.36 2.36 2.75 
BREATHING: careful-carefree 3.50 2.91 2.39 
FINGERS: loose-tight —2.07 —3.42 —1.69 

| SCREW: strong-weak —3.24 —4.55 —3.65 
| GERMS: deep-shallow 1.48 2.43 2.44 
ME: helpless-secure 3.12 2.79 2.29 

_ SCREW: nice-auful —1.93 —1.29 -1.76 
FINGERS: stiff-relaxed 3.21 4.55 2.65 
MOVIES: loose-tight —1.95 —3.56 —3.98 
SCREW: loose-tight 1.39 2.28 1.65 
HANDS: good-bad —1.54 —1.06 —.76 
DREAMS: near-far 8 1.40 1.51 
HANDS: wet-dry 3.48 1.07 .64 


E 0 lllIBLlll]É—————— t 

| * The anxious side for each item is italicized. 
h The a ere. id the differences between experimental and control groups for change 
(pretest minus post-test). 


342 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 11 


The Best Anxiety Differential Items when only the Experimental-Control Differences 
for Post-test Scores are Considered and the Loadings of the Items on the First Factor 


Study Study Study Loading on 
I II II First Factor 
Group Males Females Study II 


Items* t t t 
ere ТАЕ р 
ME: frightened-fearless —1.30 —3.21 -3.67 ‚557 
DREAMS: loose-tight 2.92 1.53 2.31 — .585 
MY MIND: loose-tight 2.75 2.52 .80 —.640 
ME: jittery-calm —2.41 —1.41 —2,22 ‚604 
BREATHING: careful-carefree —4.45 —3.18  —1.24 489 
FINGERS: loose-tight 2.14 3.12 2.54 — .621 
SCREW: strong-weak 1.39 2.74 3.59 — .380 
ME: helpless-secure —2.50 —1.83  —2.47 644 
SCREW: nice-awful 1.49 1.40 3.41 —.420 
FINGERS: stiff-relaxed —2.54 —3.72  —1.25 675 
MOVIES: loose-tight 4.02 2.56 3.48 —.409 
SCREW: loose-tight —.78 —2.31  -2.59 352 


^ The anxious side for each item is italicized. 


identical to the first factor obtained in Study I (see Table 4), which 
was interpreted as a clear anxiety factor. 

Since post-test scores were used in the factor analysis, the load- 
ings of the items which displayed large post-test score differences 
between the anxiety and control groups are of interest. The first 
centroid factor loadings of the 12 “best” post-test items are also 
presented in Table 11. All of the 12 items load above .35 on the 
anxiety factor. As in Study I, the results of the factor analysis are 


quite consistent with the results obtained from the examination of 
differences between treatment conditions. 


Discussion 

All six of the exploratory anxiety scales developed in Study I 
differentiated between anxiety and non-anxiety groups in Study П. 
In addition these scales, which had been developed for male sub- 
jects in the initial study, were Successful for female subjects in the 
second experiment. The scales also demonstrated reasonable re- 
liabilities on cross-validation, However, while the reliabilities 
reached an acceptable level, their size also clearly indicated that 


further development and refinement of the anxiety measures was 
in order. 


і. 


ALEXANDER AND HUSEK 343 


Thus, while the six anxiety scales developed in Study І held up - 
in Study II, it was deemed useful to refine and strengthen them on 
the basis of the additional data provided by Study II. Three Anx- 
iety Differential tests are suggested on the basis of the results of 
the two experiments. 

Test 1. The pretest-post-test measure: When a before-after ex- 
perimental design is employed, but no control group is available, 
the items in Table 9 may be most appropriate. 

Test 2. The pretest-post-test experimental-control measure: When 
a control group is utilized in conjunction with a before-after de- 
sign, the items listed in Table 10 seem to be best. Our experience 
suggests this may be the best design for investigating incremental 
anxiety. 

Test 3. The post-test-only measure: Where the experimenter 
chooses an after-only design, or pretests cannot be obtained, the 
items in Table 11 may be most appropriate. It is, of course, essen- 
tial that a control group be available. 

It should be noted here that there is a great deal of item overlap 
among these three tests. Therefore, it is possible to construct an 
“all-purpose” measure for the experimenter who desires one test 
that can be used under various experimental conditions. Items which 
are included in at least two of the three “specialized” tests might 
compose this “all-purpose” test. There are 15 such items, consisting 
of the 12 items in Table 11, plus LITTLE BOYS: safe-dangerous, 
HANDS: wet-dry, and HANDS: good-bad. 


Comments on the Adequacy of the Anaiety Differential 


The Anxiety Differential is apparently able to differentiate be- 
tween “anxious” and “non-anxious” states of the same individuals, 
and also between anxious and non-anxious groups. The anxiety 
scales developed in the initial research displayed adequate internal 
consistency reliabilities in the second experiment. The factor ana- 
lytic results from the two studies suggest that the anxiety measures 
are homogeneous and represent anxiety-content items. 

In addition to this evidence concerning the validity and reliability 
of the Anxiety Differential measures, certain other aspects of the 
experimental designs used in the two experiments are relevant to 
the evaluation of the anxiety measures. Different stimuli were used 
for the initial experiment and for the cross-validation study. The 


344 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


initial results were obtained for male subjects, but both males and 
females were used in the second experiment. The first experiment 
used group testing, the second was performed with individual test- 
ing. The items in the initial experiment were presented in two dif- 
ferent random orders; the items in Study II were presented in still 
a third order. The experimenter who obtained the data for the 
second experiment was not involved in the administration of the 
initial experiment. Thus, the Anxiety Differential measures do not 
seem to be unduly limited in their applicability. 

Cautions concerning the Anxiety Differential. At this point a 
number of cautionary remarks about the proposed tests are in order. 

(1) Both studies were performed with stimuli that were selected 
to elicit “bodily harm” anxiety. That is, both anxiety treatments 
concerned physical dangers and pain (a surgical operation and 
highway accidents). 

If different types of anxiety evoke different reactions in subjects, 
the proposed tests might not be appropriate for other types of anx- 


iety. A number of types of anxiety that might require different 
measures can be identified. 


a. Bodily harm anxiety: induced by some threat to a person’s 
physical integrity. 

b. Failure anxiety: induced by anticipated or actual failure 
(whether realistie or not) in a situation where a person feels 
that he is on trial. 

¢. Moral anxiety (guilt): induced by a person's awareness of 
an ethical violation committed by himself, or by some refer- 
ence group of which he is a member. 

d. Rejection anxiety: induced by the withholding of love by & 
eathected figure (or loss of love). 

e. Aggression anxiety: resulting from an awareness by the in- 
dividual that he has been hostile or aggressive. 


f. Value anxiety: induced by threatening some value held by а 
person. 


Whether or not the Anxiety Differential is adequate to measure 
other types of anxiety (such as failure f moral anxiety) cannot be 
determined without further research. / 

(2) The tests were developed on x basis of group differences. 
The identification, for diagnostic pu ses, of the amount of anxiety 


ALEXANDER AND HUSEK 345 


possessed by an individual subject has not been treated. Moderately 
reliable individual differences do seem to exist (see the correlations 
with the Nowlis-Green adjective checklist obtained in Study I, the 
internal consistency estimates, and the factor analytic results). 
However, the research has not concerned the use of the Anxiety 
Differential as a diagnostic instrument or as a measure of the anx- 
iety state of an individual subject. 

(3) The problem of distortion or “faking” of responses must also 
be raised. It will be recalled that scale 4 in Study I was developed 
specifically to be used in situations where subjects were very likely 
to falsify their responses. While scale 4 would appear to be the 
most appropriate measure for such instances, a strong note of cau- 
tion is in order. The statistical criteria used to choose items for scale 
four were (of necessity) the least statistically stringent of all (.20 
level of confidence), and it may be expected that this scale is likely 
to be the measure least sensitive to anxiety changes. It would be 
better experimental strategy, wherever possible, to reduce the sub- 
ject’s motivation to falsify responses and to use one of the other 
measures as the index of anxiety. The results from Study II lend 
strong support to the view that the other measures ordinarily should 
not be seriously affected by the kinds of “faking” likely to occur in 
the typical psychological experiment. Despite the personal, non- 
anonymous testing situation of Study II, the AD yielded clear dif- 
ferences between experimental and control groups. It is likely that 
the speed with which subjects are urged to respond, and the non- 
obvious content of many of the items make difficult the conscious 
distortion of responses. Of course, it is still possible that either 
strong motivation to falsify or unconscious distortion of responses 
(e.g., response sets) may contribute unwanted variance to the meas- 
ures. Further research should be directed toward such questions. 

(4) Although the two experiments used different stimuli to 
arouse anxiety, in both cases the stimulus was a picture projected 
on a screen (a movie in Experiment 1, a filmstrip in Experiment 2). 
It is possible that artifacts related to the similarity of the stimuli 
have affected the item results (e.g, MOVIES: loose-tight, may or 
may not prove sensitive to anxiety aroused by other means). Al- 
though these artifacts may not be too important since the testing 
situations were quite different, their possible presence should be 
noted. 


346 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(5) Although the suggested anxiety tests were developed on the 
basis of the results for both male and female subjects, the females 
tended to have somewhat lower scores on the six original anxiety 
scales developed in Study I. Should there really be some difference 
between male and female responses, two possible explanations are 
suggested. Either, (a) the anxiety treatment used in this experiment 
aroused less anxiety in the women than the men, or (b) the anxiety 
scales (developed originally for an all-male sample) were not quite 
as sensitive for women as for men. The pattern of signs in Table 7 
also suggests the possibility of an interaction of sex and treatment 
conditions. Thus, the results reported in Table 7, while in no way 
conclusive about sex differences, at least suggest that this matter 
should receive further investigation. 

(6) The choice of tests in the present investigations was based 
on the responses of college students. It is certainly possible that 
differences in age and in educational level may affect responses to 
measures such as these, which are based on verbal associations. 

(7) Although test 3, the post-test measure, is based on the anal- 
ysis of post-test scores only, it should be noted that pretest re- 
sults were also obtained for these subjects. The effects of the ad- 
ministration of the pretest on the post-test scores is not known. 
Tt is possible that the post-test measure, test 3, may produce dif- 


ferent results when a pretest is not administered to the same 
subjects. 


Summary 


Two studies involving the development and cross-validation of 
а measuring instrument for situational anxiety were described. In 
Study l six exploratory anxiety scales were constructed on the 
basis of comparisons between anxious and non-anxious subjects. 
Partial evidence for the validity of the scales was described. The 
second study attempted to cross-validate and refine the Anxiety 
Differential measures. Even though experimental conditions dif- 
fered along several dimensions from those of Study I, the results 
indicated that the anxiety scales remained sensitive and reliable. 
The anxiety measures were further refined, leading to the develop- 
ment of several tests to be used with different experimental de- 
signs. A number of advantages and possible limitations of the Anx- 
iety Differential were also discussed. 


ALEXANDER AND HUSEK 347 


REFERENCES 


Blake, R. R. and Mouton, J. S. *Personality." Annual Review of 
Psychology, X. (1959), 203-232. 

Cowen, E. L. and Obrist, P. A. “Perceptual Reactivity to Threat 
and Neutral Words under Varying Experimental Conditions.” 
Journal of Abnormal and Social Psychology, LVI (1958) , 
305-310. 

Cronbach, L. J. “Coefficient Alpha and the Internal Structure of 
Tests.” Psychometrika, XVI (1951), 297-334. 

Cronbach, L. J. and Meehl, P. E. “Construct Validity in Psycho- 
logical Tests.” Psychological Bulletin, LIL (1955), 281-302. 
Davis, R. C. and Buchwald, A. M. “An Exploration of Somatic 
Response Patterns: Stimulus and Sex Differences.” Journal of 
Comparative and Physiological Psychology, L (1957), 444-452. 

Dibner, A. S. “Ambiguity and Anxiety." Journal of Abnormal and 
Social Psychology, LVI (1958), 165-174. 

Farber, I. E. and Spence, K. W. “Effects of Anxiety, Stress, and 
Task Variables on Reaction Time.” Journal of Personality, 
XXV (1956), 1-18. 

Goodstein, L. “Inter-relationships among Several Measures of Anx- 
iety and Hostility.” Journal of Consulting Psychology, ХҮШ 
(1954), 35-39. 

Green, В. Е. and Nowlis, V. “А Factor Analytie Study of ће Do- 
main of Mood with Independent Experimental Validation of the 
Factors." American Psychologist, XII (1957), 438. (Abstract) 

Heilizer, F., Axelrod, H. S., and Cowen, E. L. “The Correlates of 
Manifest Anxiety in Paired Associate Learning.” Journal. of 
Personality, XXIV (1956), 463-474. 

Jenkins, J. J. and Lykken, D. T. “Individual Differences.” Annual 
Review of Psychology, VIII (1957), 79-112. 

Jensen, A. R. “Personality.” Annual Review of Psychology, 1X 
(1958), 295-322. 

Mandler, G., Lindzey, G., and Crouch, R. G. “Thematic Apper- 
ception Test: Indiees of Anxiety in Relation to Test Anxiety." 
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, XVII (1957), 
466-474. 

Martin, B. “A Factor Analytic Study of Anxiety.” Journal of 
Clinical Psychology, XIV (1958), 133-138. 

Moffit, J. W. and Stagner, R. “Perceptual Rigidity and Closure as 
Functions of Anxiety.” Journal of Abnormal and Social Psy- 
chology, LII (1956), 354-357. se 

Mowrer, O. H., Light, B. H., Luria, Zella, and Zeleny, Marjorie p 
“Tension Changes in Psychotherapy with Special Reference to 
Resistance.” In O. H. Mowrer (Editor), Psychotherapy: Theory 
and Research. New York: Ronald Press, 1953, 546-640. 

Neuhaus, J. O. and Wrigley, C. F. “The Quartimax Method: An 
Approach to Orthogonal Simple Structure.” British Journal of 
Statistical Psychology, VIL (1954), 81-92. 

Nowlis, V. and Green, R. F. “The Experimental Analysis of Mood." 
Acta Psychologica, XV (1959), 426-427. (Abstract) 


348 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Osgood, C. E., Suci, G. J., and Tannenbaum, P. H. The Measure- 
ment of Meaning. Urbana: University of Illinois Press, 1957. 

Postman, L., Bronson, Wanda, C., and Gropper, G. L. “Is There a 
Mechanism of Perceptual Defense?" Journal of Abnormal and 
Social Psychology, XLVIII (1953), 215—225. 

Raphelson, A. C. “The Relationships among Imaginative, Direct 
Verbal, and Physiological Measures of Anxiety in an Achieve- 
ment Situation.” Journal of Abnormal and Social Psychology, 
LIV (1957), 13-18. 

Silverman, J. J. and Powell, V. E. “Studies on Palmar Sweating: 
Part I. A Technique for the Study of Palmar Sweating.” Amer- 
ican Journal of the Medical Sciences, CCVIII (1944), 297-805. 

Thorndike, R. L. Personnel Selection. New York: John Wiley & 
Sons, 1949. 

"Thurstone, L. L. Multiple-Factor Analysis. Chicago: University of 
Chicago Press, 1947. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 2, 1962 


THE IDENTIFICATION OF GIFTED ELEMENTARY 
SCHOOL CHILDREN WITH EXCEPTIONAL 
SCIENTIFIC TALENT! 


GERALD 8. LESSER, FREDERICK B. DAVIS, 
AND LUCILLE NAHEMOW 
Hunter College 


RECENT international events have focused attention upon the 
urgent need for early identification of exceptional science talent. 
The scarcity of skilled scientific personnel and the inability of our 
schools to capitalize fully upon the science potential of students 
emphasizes the demand for early recognition of science ability. The 
purpose of this research was to construct an instrument for identi- 
fying scientifically gifted children at an early age. The specific 
hypothesis tested is that a test, especially constructed to predict 
the achievement of gifted children in a third-grade science class, 
will yield a significant positive correlation with the criterion (a 
weighted composite of scores in science achievement tests adminis- 
tered 3-9 months later). 

In recent decades, several extensive programs have been estab- 
lished at secondary school and college levels for the identification 
and training of students with superior ability in science. Among the 
projects that have shown promise in the prediction or development 
of science ability are the Annual Science Talent Search (Edgerton 
& Britt, 1944, 1947), the National Merit Scholarship Program 
(National Merit Scholarship Corporation, 1959), the Longitudinal 
Study of Career Development of Scientists (Cooley, 1958, 1959), 
and the Joint Program for Technical Education (Joint Program 


1 This research was supported jointly by the Cooperative Research Program 


(Grant No. 392) of the U. S. Office of Education and by Hunter College. 
349 


350 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

for Technical Education, 1957). Many other valuable'contributions 
(e.g., U. 8. Office of Education, 1952; Seashore, 1952) have ‘been 
made to the development of techniques for the seléction of second- 
ary school and college students with science potential. However, 
efforts to identify science aptitude should not be restricted to sec- 


ondary school and college levels. The earlier the identification of , 


students with science potential can be accomplished, the more en- 
couragement and guidance these students can be given. 

Much of the literature concerning science talent has centered 

upon two questions: (a) does science aptitude represent a single, 
clearly defined entity? and (b) if science aptitude is to some degree 
a circumscribed, unitary variable, what are its distinguishing com- 
ponent characteristics? Certain investigators (e.g., Davis, 1951; 
Fehr, 1953; Guilford, 1950; Subarsky, 1948), describing mainly the 
abilities of secondary school children, contend that there is a clearly 
defined trait of science aptitude, analogous to musical and artistic 
aptitudes. There is much disagreement, however, in the specification 
of the components of this science aptitude. Guilford (1950), for ex- 
ample, emphasizes sensitivity to problems, ability to develop novel 
ideas, and the ability to evaluate. Subarsky (1948) stresses special- 
ized and persistent curiosity, alertness in detecting inconsistencies, 
and a high degree of mechanical-mindedness. Cole (1956) enu- 
merates the characteristics of industry, devotion to work, energy, 
and initiative among the components of science aptitude, while 
Super and Bachrach (1957) list spatial visualization, manipulative 
ability, ability to plan and design a study, and ability to com- 
municate. 
; Other authorities (e.g., Brandwein, 1951, 1955) argue that there 
is no single trait of science talent per se, but that science talent is 
one aspect of high general intelligence which emerges through en- 
vironmental stimulation and available opportunities for develop- 
ment. Edgerton (1959), Roe (1952), MacCurdy (1956), and others 
have referred to the personality and developmental influences in 
science ability. Cooley (1958) suggests that there is no isolated, 
unitary set of characteristics which distinguishes science aptitude 
and recommends multivariate, longitudinal analyses of personality, 
social, and motivational variables in the study of science ability. 

However, even those investigators (e.g., Freeman, 1955) who re- 
gard science aptitude as a multi-dimensional conglomerate of 


LESSER, DAVIS AND NAHEMOW 351 


кар agree that, prediction of success in scientific studies and 
Kee feasible with an appropriate combination of vari- 
ables.’ The prediction of science achievement by tests of science 
aptitude does not necessarily imply that such a science-aptitude 
test measures psychological functions wholly different from those 
used in other types of mental activity. 

The results of an earlier study (Davis, Lesser & French, 1960) 
encourage further work upon the development of techniques for 
measuring science aptitude in young children. This study included 
the measurement of knowledge of scientific principles and informa- 
tion (as well as space conceptualization, vocabulary, number abil- 
ity, and reasoning) in groups of children four through six years of 
age. The test of science displayed adequate reliability and yielded 
a substantial percentage of nonchance variance estimated to be 
unique among the five tests employed. The study also supplied 
some evidence that five-year-old children selected as outstanding 
in science ability, on a preliminary form of the tests, respond dif- 
ferently from other children to scientific material presented in the 
classroom. 


Characteristics of the Hunter Science Aptitude Test 


Two forms of a 91-item Hunter Science Aptitude Test were con- 
structed. The outline of the test is presented in Table 1. The item 
categories were established through (a) the selection of the com- 
ponents of science aptitude about which there is consensus in the 
literature, and (b) the inclusion of abilities appropriate to the popu- 
lation of six- and seven-year-old children. 

The test outline indicates that items were divided into “items in 
groups” and “single items.” The “items in groups” were developed 
for experimental use to assess the abilities listed in category III 
(ability to apply principles in making predictions) and category 
IV (ability to use the scientific method) of the test outline. These 
“items in groups” are presented in a serial arrangement; there are 
12 series of from 3 to 5 items each in Form AX and in Form BX. 
Thus, in each form, 46 items of the 91 are arranged in groups. A 
sample series from Form BX is presented in Table 2. This serial 
method of presentation, in which a child is tested for his ability to 
learn new material at the same time that his prior knowledge of 
material is being tested, represents an effort to assess the child’s 


352 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ability to combine elements into principles as well as his ability to 
apply the scientific method, The basic premise underlying this form 
of testing is that, with very young children, the ability to organize 
available facts into a meaningful concept is at least as indicative 
of scientific aptitude as is the sheer number of facts that have been 
accumulated. 

In this serial testing arrangement, if the child answers incor- 
rectly or does not know the answer to any of the first three items in 
the group, it is scored as incorrect, and then the answer is explained 

TABLE 1 
Outline for Hunter Science Aptitude Test 


FORM AX FORM BX 
Scientific Aptitude Ttems in Single Items in Single 
vior Objectives Groups Items Groups Items 
I. Ability to recall information 

A. Knowledge of scientific 

vocabulary 3 2 3 2 
B. Knowledge of scientific 

principles 3 5 3 5 
C. Knowledge of tools and e 

Scientific instruments 2 4 2 4 
D. Knowledge about the natural 

environment 11 5 11 5 

II. Ability to assign meanings to 

observations 
A. Formulation or verbalization " 

of a principle to explain an 

effect described 8 5 8 5 
B. Identification of erueial 

elements of a problem 5 5 5 5 

IH. Ability to apply principles in 

making predictions 
A. Utilization of available 

information in novel situations 2 3 2 3 
B. Utilization of a scientific 

principle in a familiar 

situation 5 6 5 6 
C. Analysis of the factors 

influencing predictions 5 6 5 6 

IV. Ability to use the scientific 

method 
A. Planning steps leading to a 

solution 2 4 2 4 


Total: i 45 46 45 


4 


LESSER, DAVIS AND NAHEMOW 353 
to him. This explanation, like the question, is in prescribed form. 
Each item is scored correct or incorrect before the explanation is 


TABLE 2 
Sample of Series Items from Hunter Science Aptitude Test 

i 1 Series 5 - 
Subject Date 
Class Age Examiner 

Question Expected Answer Score 
A..“What do we call a space where there is 

nothing, not even air?" “A vacuum" 


2 
“Space,” “hole” 0 


Explanation: *A vacuum. It's a place 

with nothing in it, not even air.” 
eS ш ee 
B. Show child two pieces of paper. Give child one. 

“These pieces of paper are exactly the same. 

Crumple this one into a ball. Now is one 

paper heavier than the other?" “No” · 2 

“Yeg” 0 

Explanation: “The papers weigh the same 

amount. The crumpled paper weighs the same 

amount as it did before you crumpled it. 

It’s still the same piece of paper." 


C. Show the child two pieces of paper, 
5 one crumpled, one uncrumpled. 
F “Now watch what I do.” 
E drops both papers to the floor from 
table height. 
] "Why does this piece of paper take longer 
than the crumpled one to reach the floor?’ Child indicates 
БА (1) greater surface 
area—more "pushing" 
] room for air, and 
(2) air pushes against 


paper 2 
Either part alone 1 
. Neither part, 0 
Explanation: “The flat one has a larger " 


surface; there is more air pushing on it 
as it falls." 


D. Show child the same two pieces of paper. 
\ “If we were to drop both of these papers 
in a vacuum, would the crumpled one reach 
the floor first?” ` “No.” (Q) “There is no 
air to push on paper.” 2 
“No.” (Q) Incorrect 
If child answers “No,” E asks, “Why not?" reason given. 0 
“Yes.” 0 
ee 


eo 


354 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


given by the examiner. Thus, items early in each series measure 
information that the child acquired prior to the test situation, while 
the solution of later items in the Series requires comprehension of 
the explanation given to the previous items in the series and the 
application of the explanation to a new situation. This technique 
of presenting items in meaningful groups permits scoring pupils on 
their knowledge of facts while at the same time providing them 
with additional facts they need for educing principles. They are 
then scored on their ability to educe principles on the basis of the 
facts they have already known or have been told during the testing. 

Two judges independently assigned each of the 182 items in the 
Hunter Science Aptitude Test to the categories presented in the test 
outline. The degree of inter-judge agreement in categorizing items 
is indicated in a Coefficient of Concordance (W) of .80, 

Other principles followed in constructing the tests were as fol- 
lows: First, since the children to be tested were 6 and 7 years of 
age, individual administration seemed necessary. This allowed for 
variations in the child’s attitudes, provided the motivation that 
skilled examiners can engender through personal contact, and re- 
duced errors of measurement owing to distractions. Second, an at- 
tempt was made to maintain the children’s interest by including a 
variety of pictorial materials and a number of items that allowed 
the children to manipulate test objects. Third, the test items re- 
quired no reading on the children’s part, thus minimizing any effect 
on the scores attributable to the wide individual differences that 
exist among six- to Seven-year-old children in reading ability. 
Fourth, all directions and explanations were worded as simply and 
directly as possible, and as many of the words that children them- 
selves were found to use Were incorporated in the wording of direc- 
tions, items, and explanations, Finally, all tests were unspeeded. 
This requirement, excluded the effects of individual differences 
among the children in reaction time, ability to manipulate ma- 
terials, ete. 


Testing 


Both forms of the Hunter Science Aptitude Test were adminis- 
tered to the 58 children in the third grade at Hunter College Ele- 
mentary School just prior to and at the beginning of the school year 
1959-1960; all testing was completed before formal science lessons 


LESSER, DAVIS AND NAHEMOW 355 


began in the classroom. Because of the time pressures involved in 
scheduling the administration of 116 tests within a two-week pe- 
riod, the different forms of the test were administered to each child 
by different examiners.? Each session required about one hour of 
testing time. These tests were scored and the results kept confi- 
dential by one of the investigators (FBD). In order to eliminate 
any chance of contamination between the Hunter Science Aptitude 
Test and the criterion measures obtained during the course of the 
school year, neither anyone in the Hunter College Elementary 
School nor the other investigator (GSL), who was responsible for 
the planning, administration, and scoring of the criterion measure, 
had any knowledge of the Hunter Science Aptitude Test scores. 

The children tested ranged in age from 6 years, 9 months to 7 
years, 9 months. Their Stanford-Binet (1937 Revision) Intelligence 
Quotients ranged from 136 to 171, with a mean score of 150.4. The 
children come from a large district in Manhattan; their families 
constitute a relatively homogeneous middle-class group. The school 
population of Hunter College Elementary School has been de- 
scribed in detail elsewhere (Brumbaugh & Rosheo, 1959; Hunter 
College Elementary School, 1959). 


Results 


Hunter Science Aptitude Test: Each response to the Hunter 
Science Aptitude Test items was assigned a score of 2, 1, or 0. The 
maximum range of scores on each form of the test is therefore from 
O to 182. The actual scores for Form AX ranged from 46 to 131, 
with a mean of 77.83 and a standard deviation of 19.40. For Form 
BX the scores ranged from 26 to 115, with a mean of 70.17 and a 
standard deviation of 23.26. The variances of Form AX and Form 
BX are not significantly different at the 5 per cent level = 
1.44; n; = tm = 57). 

The parallel-form reliability coefficient for Forms АХ and ВХ 
is .64. Despite the fact that the variances of the two forms were not 
significantly different, Angoff's equation Number 6 (Angoff, 1953, 
p. 4) was applied in order to estimate the reliability of each form 
of the test separately from the correlation between the parallel 


? We are indebted to Mr. Morton Spitzer for his contribution to the admin- 


istration and scoring of these tests. 


356 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


forms and their variances. Form AX manifests a reliability co- 
efficient (r4 = .61) somewhat lower than Form BX (rss = .67). 

Considering the length of the Hunter Science Aptitude Test, these 
appear to be unusually low reliability coefficients. The low average 
age of the children tested, the variety of the facts and concepts 
tested, the fact that different examiners administered the two forms 
of the test, and the difficulty of many of the items may have been 
contributing factors in producing this low reliability coefficient. 

Predictive Validity Criterion—The Science Achievement Tests: 
Hunter College Elementary School provides unique facilities and 
teaching conditions for the study of science at the elementary 
school level. Laboratory tables, built along dimensions suitable to 
young children, are available for children at all grade levels; at 
these laboratory tables, children can individually manipulate equip- 
ment and apparatus and conduct their own experiments under the 
science teacher's guidance. A science specialist teaches this subject 
to all elementary school grades. Lectures, demonstrations, and in- 
dividual experimentation are all combined in the science curriculum. 
One hour per week is devoted to science in the third-grade classes 
used in this research. 

The criterion used in this study was a weighted composite of 
objective test scores in science achievement obtained by each child 
throughout the course of the school year. Seven science achieve- 
ment tests were constructed to evaluate the child’s learning of the 
following topics covered in 30 one-hour class sessions? After the 
last class session devoted to a partieular topie, the appropriate sci- 
ence achievement, test; was administered. The topics covered were: 


I. Water: Its Importance in the World; Techniques for 
Purification (4 Sessions). 
IL. Water: How Water is Used for Cleaning Purposes (4 ses- 
sions). 
ПІ. Water: Forms in Which Water Appears in Air (3 sessions). 
IV. Water: Archimedes Principle (7 sessions). 
V. Water: Water Pressure; Water Seeks Its Own Level; the 
Cartesian Diver (4 Sessions), 


3 Mr. Manuel Pantezonis, science teacher at Hunter College Elementary 
School, provided invaluable advice concerning the content of the achievement 
test items. 


4 


-90 


LESSER, DAVIS AND NAHEMOW 357 


VI. Water: Surface Tension, Cohesion and Adhesion (5 ses- 
sions). 
VII. Sound: Vibration and Transmission (3 sessions). 


Seven objective examinations (including 140 items in all) were 
constructed to measure the subjects’ achievement in these areas. 
These objective examinations included multiple-choice (three- 
choice) and true-false items. Each test was administered to ap- 
proximately 15 children at one time; thus, each was given four 
times. Each item was read twice to the children, and, in the case 
of the multiple-choice items, the response alternatives were written 
on the blackboard and were read aloud. Before the administration 
of the first achievement test, a brief practice period acquainted the 
children with the form of the testing. The same form was employed 
throughout the course of the seven achievement tests; no mechani- 
cal difficulties were encountered in the children’s ability to under- 
stand and follow the directions and content of the questions. 

The range, mean, standard deviation, and odd-even reliability 
coefficient for each science achievement test are presented in Table 
3. The intercorrelations (Pearson product-moment correlation co- 
efficients) among these science achievement tests are presented in 
Table 4. 

In order to obtain a single, composite score for each subject from 
the seven science-achievement tests for use as a criterion with 
which to correlate the Hunter Science Aptitude Test scores, each 
of a given child’s seven raw science-achievement scores was multi- 


TABLE 3 


Ranges, Means, Standard Deviations, and Odd-Even 
Reliability Coeficients for Science 
Achievement Tests 


Á 


Actual Standard Reliability 

Score Range N Mean Deviation Coefficient* 
Unit I 10-19 54 14.53 1.96 .34 
Unit II 10-20 56 16.27 2.02 -74 
Unit III 12-22 56 17.00 2.41 .53 
Unit IV 11-18 57 14.18 1.57 ‚39 
Unit V 7-16 55 12.22 1.87 .51 
Unit VI 10-19 55 14.45 2.29 .52 
Unit VII 8-18 54 15.00 2.00 E 


* Pearson product-moment half-test correlation coefficient, corrected by means of Angoff 
equation No. 16 (Angoff, 1953, p. 7). 


358 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 
Intercorrelations Among Science Achievement Tests* 


Unit I Unit II Unit III Unit IV Unit V Unit VI Unit VII 


Unit I 

Unit II 22 

Unit III 54 .97 

Unit IV 44 .24 29 

Unit V 23 .23 43 .46 

Unit VI 33 .29 43 .85 41 

Unit VII 33 .21 51 .30 .28 .18 

Mean 14.53 16.27 17.00 14.18 12.22 14.45 15.00 
S.D. 1.96 2.02 2.41 1.57 1.87 2.29 2.00 


* When N = 54, an intercorrelation of +.27 is significantly different from zero at the 5 per 
cent level; an intercorrelation of --.35 at the 1 per cent level. 


plied by a weight so determined as to make the per cent of the 
eriterion-score variance attributable to each of the seven tests es- 
sentially the same as the per cent of total science-class time de- 
voted to each topic taught. These raw-score weights and the per 
cent of criterion-score variance contributed by each science achieve- 
ment test are presented in Table 5. The seven weighted raw scores 
were then summed to obtain a single, composite science achieve- 
ment score for each child. The weighted science-achievement scores 
ranged from 9.7 to 32.5, with a mean of 19.23 and a standard de- 
viation of 5.07. The close correspondence between the per cent of 
class time devoted to each unit and the per cent of composite- 
Score variance contributed by it is obvious in Table 5. 


TABLE 5 


Per Cent of Class Time Devoted to Each Science Unit, Raw-Score Weights, 
and the Per Cent of Composite Criterion-Score Variance Contributed 
by Hach Science Achievement Test 


ААЬЫЬЬАЫЬЫЬЫЬЫЕЕЫЫЕ—_—_——‏ س 


Per Cent of 
Raw-Score Composite Criterion Per Cent of Class Time 
Weight Score Variance Devoted to Unit 


LESSER, DAVIS AND NAHEMOW 359 


The reliability coefficient of the weighted composite scores was 
obtained algebraically by using the variances, weights, and inter- 
correlations of the science-achievement test scores in a specific 
application of the general formula for computing the product- 
moment correlation coefficient between two sets of weighted sums 
(Davis, 1945, Equation 16). It proved to be .82. It may be noted 
that Test IV was the second least reliable test (ri, = .39, see Table 
1), and yet it was weighted heaviest in the composite (23 per cent 
of variance, see Table 3). This fact is one reason why the reliability 
of the weighted composite scores is not greater than .82. 

Relationships between Hunter Science Aptitude Test and Science 
Achievement Tests: The predictive validity of the Hunter Science 
Aptitude Test was estimated by computing Pearson product- 
moment correlation coefficients between scores on each of the two 
forms of the test and the weighted composite scores derived from 
the seven science-achievement tests administered during the course 
of the school year. For Form AX this coefficient is .77 (df = 56, 
p < 01); for Form BX it is .74 (df = 56, p < .01). These are 
exceptionally high validity coefficients even when the time interval 
between the administration of the predictor tests (Forms AX and 
BX) and the criterion tests (Units I through VII) was as little as 
from two to eight months. In fact, when the two validity coefficients 
are corrected for attenuation, using the parallel-forms reliability 
coefficients of Forms AX and BX and the odd-even reliability co- 
efficient for the weighted composite score, the resulting coefficients 
are 1.08 for Form AX and 1.00 for Form BX. These coefficients 
probably represent slight overcorrections because the parallel-forms 
reliability estimates for Forms AX and BX take into account error 
variance resulting from day-to-day variations in the children’s 
performances as well as inter-examiner variability. The correction 
for attenuation assumes reliability estimates that exclude both 
these sources of error variance. In any event, neither corrected co- 
efficient is significantly greater than unity. Nonetheless, the data 
suggest that the predictive validity of Forms AX and BX is about 
as high as it сап be, given their reliabilities and the reliability of 
the eriterion. 

A. second, independent criterion was available in the science 
teacher's estimate of the science ability of each child. During the 
middle of the sehool year, the science teacher ranked the children 


360 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in the two classes separately on the ability they demonstrated in 
class to learn seience information and principles. These rankings 
assigned by the teacher were correlated with scores on Forms AX 
and BX of the Hunter Science Aptitude Test. The Pearson product- 
moment coefficients (combined by Fisher’s z transformation) are 
presented in Table 6. The correlation between the two criteria (the 
weighted composite science-achievement scores and the science 
teacher's rankings) is .76 for Class 7-8X and .79 for Class 7-8Y. 

Since teachers’ ratings are usually unreliable, the predictive 
validity coefficients presented in Table 6 seem impressively high. 
It should be noted that they are consistent with the correlations be- 
tween the aptitude-test scores and the composite science-achieve- 
ment test scores. 

Several research workers (e.g., Brandwein, 1951, 1952) have con- 
tended that one factor contributing heavily to science achievement 
is high intelligence of the type revealed by success on tests of intel- 
ligence, Stanford-Binet (1937 Revision) Intelligence Quotient scores 
were available for each child in the sample used in this study. These 
intelligence quotients were derived from tests administered from one 
to three years prior to the school year during which the science- 
achievement measures were obtained. Table 7 indicates that the 
Pearson product-moment: correlation coefficient between IQ and the 
criterion of composite achievement-test scores is .21 (p < .05). 
There is a statistically significant difference (p < .01) between the 
degree to which the Hunter Science Aptitude Test and the Stanford- 
Binet Intelligence Quotient predict science achievement as meas- 
ured by the composite criterion, 

Stanford-Binet Intelligence Quotients predict the science teach- 
er’s ranking of science accomplishment to a smaller degree than 
they predict the composite criterion of science achievement. The 


TABLE 6 
Correlations between Form AX and Form BX of the Hunter 
Science Aptitude Test and the Science Teacher's 
Ranking of Science Classroom Performance 


Class 7-8X Class 7-8Y Combined 


(N = 28) N = 30) N = 58) 
Form AX .62* ( Jn А y .67° 
Form BX .56* JT ‚68* 


*› <<! 


LESSER, DAVIS AND NAHEMOW 361 
TABLE 7 


Correlations between Stanford-Binet Intelligence Quotients and the Weighted Composite 
Science Achievement Scores, the Teacher’s Ranking of Science Performance, 
and the Hunter Science Aptitude Test (Form AX and Form BX )* 


Stanford-Binet IQ Mean 8.D. 
Science Achievement Tests .21 19.23 5.07 
"Teacher's Ranking of Science Performance .09 29.50 16.74 
Hunter Science Aptitude Test (Form AX) .14 77.83 19.40 
Hunter Science Aptitude Test (Form BX) .09 70.17 23.26 
Mean 150.40 
S.D. 8.28 


* When N = 58, an intercorrelation of +.26 is significantly different from zero at the 5 per 
cent level; an intercorrelation of +.34 at the 1 per cent level. 


Pearson product-moment correlation coefficients presented in Table 
7 between IQ and science teacher's rankings were .08 and .10 in the 
two classes. The coefficient of .09 in the two classes combined is not 
significantly different from zero at the five per cent level. 

There are a number of possible interpretations of these weak re- 
lationships between verbal intelligenee (as measured by the Stan- 
ford-Binet Scales) and science achievement. The range of IQ scores 
is greatly restricted. Also, these IQ scores were obtained from one 
to three years earlier than the science-achievement scores, while the 
scores for the Hunter Science Aptitude Test were obtained im- 
mediately prior to the school year during which achievement in- 
formation was gathered. For whatever reason, however, the Hunter 
Science Aptitude Test is a significantly better predictor of science 
achievement than Stanford-Binet IQ under circumstances such as 
those found in this study. 

Since clear-cut and consistent sex differences in science achieve- 
ment have been found at the secondary school and college level and 
beyond (e.g., Edgerton & Britt, 1944, 1947), an analysis to discover 
possible sex differences in science aptitude and achievement in ele- 
mentary school children was made. Consistent with previous ex- 
perience, the mean score of the boys (N = 25) exceeded the mean 
score of the girls (N = 33) on all three science measures (i.e., 
Forms AX and BX of the Hunter Science Aptitude Test and the 
composite science-achievement test) ; however, none of these three 
differences reached statistical significance at the 5 per cent level. 


362 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Discussion 

Although the reliabilities of the Hunter Science Aptitude Test 
are only moderately high, the predictive validity data indicate 
great promise for this test in predicting classroom science achieve- 
ment among gifted elementary-school children. For this sample of 
58 children at six and seven years of age, the two forms of the 
aptitude measure administered at the beginning of the school year 
displayed a high relationship to measures of science &chievement 
compiled throughout the course of the school year. It seems likely 
that this relationship may be partly explained by the manner in 
whieh the tests were constructed and followed up. In gathering 
material for items to include in the Hunter Science Aptitude Test, 
the third-grade course of Science study was carefully observed and 
analyzed. The content of the 182 items in the aptitude tests was 
modeled upon the science concepts and material—included in the 
third-grade course—with which the child is familiar by the third 
grade. The composite science-achievement test used as a criterion 
constitutes, of course, a direct representation of the actual science 
material covered in the third grade. Thus, the exceptionally high 
correlations obtained may be partly understood as a reflection of 
the fact that the same (or closely similar) kinds of abilities are 
measured in both aptitude and achievement tests. This conclusion 
suggests the need for follow-up studies of the children tested to 
obtain additional criterion variables as they take other science 
courses in elementary and secondary school and in college. It also 
suggests the use of these tests with representative samples of Ameri- 
can elementary-school children to permit further evaluation of the 
tests’ usefulness. It would be of great educational and social utility 
to have tests capable of identifying at an early age children in 
typical school populations with exceptional science ability. 


Summary 


The purpose of this Tesearch was to construct and validate an 
instrument for the identification of science talent in young ele- 
mentary school children. 

Two forms of a 91-item Hunter Science Aptitude Test were con- 
structed. The items included in these tests were designed to measure 
(1) the ability to recall scientific information, (2) the ability to 
assign meaning to observations, (3) the ability to apply scientific 


LESSER, DAVIS AND NAHEMOW 363 


principles in making predictions, and (4) the ability to use the 
scientific method. Both forms of the test were administered at the 
beginning of the school year to each child in a sample of gifted six- 
and seven-year-old elementary-school children drawn from the 
Hunter College Elementary School. The resulting scores were kept 
completely confidential until after criterion data had been obtained. 
These data were derived from a series of seven tests of science 
achievement in the third grade. The tests were administered at the 
completion of each science unit during the course of the entire 
school year. A single weighted composite science-achievement score 
was obtained for each subject. 

The parallel-forms reliability coefficient of the Hunter Science 
Aptitude Test was .64. The reliability of the weighted composite 
science achievement score was .82. The correlation of the aptitude- 
test and composite science-achievement scores was high—in fact, 
essentially the maximum possible, given their reliabilities. The cor- 
relation of Form AX with the criterion was .77; of Form BX, .71. 
Corrected for attenuation, these validity coefficients became 1.08 
and 1.00, respectively. 

These extremely high predictive validity coefficients may be in 
part attributed to the fact that the Hunter Science Aptitude Test 
and the science achievement criterion overlap in the kinds of 
abilities and content measured. Both were modeled upon the science 
content of the third-grade science curriculum. This fact makes 
certain extensions of this research highly desirable. Validation stud- 
ies using other criteria with samples of gifted and of representative 
American school children would be of value. 


REFERENCES 


Angoff, W. H. “Test Reliability and Effective Test Length.” Psy- 
chometrika, XVIII (1953), 1-14. 

Brandwein, P. F. “Selection and Training of Future Scientists: 
II. Origin of Science Interests.” Science Education, XXXV 
(1951), 251-253. ДА, 

Brandwein, Р. F. "Selection and Training of Future Scientists: 
III. Hypotheses on the Nature of Science Talent." Science Edu- 
cation, XXXVI (1952), 25-26. | 

Brandwein, Р. F. The Gifted Student as Future Scientist, New 
York: Harcourt, Brace and Company, 1955. } 

Brumbaugh, Florence N. and Roshco, B. Your Gifted Child. New 
York: Henry Holt and Co., 1959. 

Cole, C. C. Encouraging Scientific Talent. New York: College En- 
trance Examination Board, 1956. 


364 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Cooley, W. W. “Attributes of Potential Scientists.” Harvard Edu- 
cational Review, XXVIII (1958), 1-18. 

Cooley, W. W. Career Development of Scientists: An Overlapping 
Longitudinal Study. U. S. Office of Education Report, 1959. 
(Mimeographed.) 

Davis, F. B. "The Reliability of Component Scores." Psychomet- 
rika, X. (1945), 57-60. 

Davis, F. B., Lesser, G. S., and French, Elizabeth G. “Identifica- 
tion and Classroom Behavior of Gifted Elementary School Chil- 
dren." Cooperative Research Monographs, 1960, No. 2, 19-32. 

Davis, W. “Search for Talent in Science.” In Witty, P. (Ed.) The 
Gifted Child. Boston: Heath & Co., 1951. 

Edgerton, H. A. “Two Tests for Early Identification of Scientific 
Ability." EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, XIX 
(1959), 299-304. 

Edgerton, H. A. and Britt, S. H. "Sex Differences in the Science 
Talent Test.” Science, C (1944), 192-193. 

Edgerton, H. A. and Britt, S. H. "Technical Aspects of the Fourth 
Annual Science Talent Search.” EDUCATIONAL AND PSYCHO- 
LOGICAL MEASUREMENT, VII (1947), 3-21. 

Fehr, H. F. "General Ways to Identify Students with Science and 
Mathematies Potential? Mathematics Teacher, XLVI (1953), 
230-234. 

Freeman, F. 8. Theory and Practice of Psychological Testing. New 
York: Henry Holt and Co., 1955. 

Guilford, J. Р. "Creativity." American Psychologist, V (1950), 
444—454. 

Hunter College Elementary School Staff. Hunter College Elemen- 
tary School Gifted Children. New York: Hunter College Ele- 
mentary School, 1959, 

Joint Program for Technical Education: Draft Report on Results 
1956-1957. Columbia University School of Engineering, 1957. 
MaeCurdy, R. D. “Characteristics of Superior Science Students and 

Their Own Sub-Groups.” Science Education, XL (1956), 3-24. 

National Merit Scholarship Corporation. Technical Report No. 3. 
A Program of Research on the Identification, Motivation, and 
Training of Talented Students, 1959. 

тае The Making of a Scientist. New York: Dodd, Mead, 


Seashore, H. The Search for Talent. New York: Psychological Cor- 
poration, 1952. " 


Subarsky, Z. “What Is Science Talent?" Scientific Monthl , LXVI 
(1948), 377-382. онен 

Super, D. E. and Baehrach, P. B. Scientific Careers and Vocational 
Development Theory. New York: Teachers College, Columbia 
University, 1957. 

United States Office of Education. Education for the Talented in 
Mathematics and Science, Bulletin 1952, No. 15. Washington: 
Government Printing Office, 1952. 


‚> 


x 


M 


EDUCATIONAL AND PSYCHOLOG: 
Vor. XXII, No. 2, 1962 Sar 


STUDIES OF FORCED-CHOICE METHODOLOGY: 
INDIVIDUAL DIFFERENCES IN SOCIAL DESIRABILITY 


ELI SALTZ, MICHAEL REECE, Aw» JOEL AGER 
Wayne State University 


Among the response sets which have been identified (Cronbach, 
1950), one of the most intensively studied has been that of social 
desirability (Edwards, 1957). Much of the research has been di- 
rected toward methods of eliminating the effect of social desira- 
bility on personality and interest test scores. 

One approach to the problem of minimizing social desirability 
has been to construct forced-choice versions of these tests. A typ- 
ical forced-choice item consists of two or more statements, each 
measuring a different factor and all equated for social desirabil- 
ity. The testee is instructed to pick the statement most applicable 
to himself. One rationale for such a procedure goes as follows: 
Since the statements are of equal social desirability, the testee 
cannot make himself look better by choosing one, rather than an- 
other, of the statements; consequently, he will answer honestly. Ob- 
viously the equating for social desirability is the critical manipu- 
lation in construction of a forced-choice test. The purpose of the 
present paper is to investigate the meaningfulness of such equating 
of social desirabilities. 

Social desirability is typically determined by having many sub- 
jects rate each of the potential test statements for social desira- 
bility. Items whose average social desirabilities are approximately 
equal over the group ean then be paired in a single item. Edwards 
(1957) finds that such average ratings have wide generality for 
items such as those used in the Edwards Personal Preference Sched- 
ule (1953); age, sex, and socio-economic status of the test group 


365 


366 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


have little effect on the average social desirability ratings of the 
items. " 
However, there are individual differences in social desirability 
of a statement, and the effect of such individual differences has 
never been adequately evaluated. Since the Edwards Personal Pref- 
erence Schedule is one of the most carefully constructed of the 
forced-choice tests, it was used as the vehicle for the present study. 
In the present study, subjects were administered the statements 
as paired by Edwards for social desirability and were instructed 
to pick the statement in each pair that seemed most socially de- 
sirable. If individual differences in social desirability have little 
effect on performance on the test, scores for the fifteen traits meas- 
ured by the test should be approximately equal for each subject. 


Procedure 


The test instructions, indicated below, follow as closely as pos- 
sible the instructions originally used by Edwards (1957, p. 4) for 
determining the social desirability of the individual statements. 


INSTRUCTIONS 


Below you will find an example of two things that a person says 
that he likes or would like to do. These are called traits. A judge, 
such as yourself, has made an estimate of the desirability or un- 
desirability of these traits in people by circling the letter cor- 
responding to the letter opposite the trait. 
Example 1: A—To like to punish your enemies 

B—To like to read psychological novels 
Example 2: A—To like to go out with your friends 

B—To like to make excuses for your friends 
The person who judged these traits believes that "to like to 
read psychological novels” is a more desirable trait in others 
than is “to like to punish your enemies.” So he circled B. 
He also believes that “to like to go out with your friends” is à 
more desirable trait in others than is “to like to make excuses 
for your friends.” So he circled A. 
Indicate your own judgments as to the relative desirability of 
the traits that will be given to you in the same manner. Re- 
member you are to judge the traits in terms of which you con- 
sider more desirable in others. Be sure to make a judgment about 
the traits for each question. 


Crewe 


AC. d 


SALTZ, REECE AND AGER 367 


The subjects were 56 students in Introductory Psychology at 
Wayne State University. The subjects knew nothing about the pur- 
pose of the study or about forced-choice scales. 


Results 


Edwards scoring sheet for the Personal Preference Schedule is 
constructed so that half the items on each trait (ie., 14 items) are 
scored by summing across rows, the other half are scored by sum- 
ming across columns. The total score for a trait is obtained by 
summing the row and column scores. Presumably the two halves 
are approximately equivalent. In the present study row and col- 
umn sums were correlated for each trait. The correlations are in- 
dicated in Table 1. 

Despite the very small number of items in each half of the scale 
for each trait, the correlations are sizable. This indicates that indi- 
vidual subjects are very consistent in the social desirability which 
they attribute to a trait. If equating within pairs for social desira- 
bility also eliminates individual differences in social desirability, 
the social desirability score for any trait on either half of the EPPS 
would be composed of random error. In this case the expected be- 
tween-halves correlations for each trait would be zero. 

Another indication of individual consistency in the ratings of so- 
cial desirability is obtained by comparing the profile of trait scores 


TABLE 1 


Split-Half Correlations for Traits on Edwards PPS, Administered 
Under Instructions to Answer each Forced-Choice Pair 
in Terms of Social Desirability 


ч 


Trait 

Affiliation .60 
Autonomy .70 
Exhibition .23 
Succorance .57 
Order .57 
Dominance .53 
Deference .42 
Achievement .72 
Heterosexuality .81 
р .72 

.45 
еа : нт 
Endurance "56 
Nurturance : 63 


Intraception 
EF eae is o t a ADEM 7000 0 c0 _ 


368 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


for each subject on the two halves of the test. Again, if the state- 
ments within each item are actually of approximately equal social 
desirability, any pattern of trait Scores obtained in one half of the 
test would be unrelated to the pattern obtained on the other half. 
Instead there is a high relationship between profile patterns for the 
two halves of the test for each subject. The median rho for the 56 
subjects was + .70. All the rhos were positive. 

On the basis of Edwards’ original construction of the scale, one 
would expect that the trait profiles obtained in this study were the 
result of individual preferences rather than systematic group pref- 
erences in social desirability among the traits. This was easily tested 
by obtaining a total score for each subject on each trait (by com- 
bining the scores on the two halves of the test) and doing a con- 
cordance analysis of the trait profiles, comparing the profiles of all 
subjects simultaneously, This produced the equivalent of a mean 
rho of + .14. Thus it becomes clear that Edwards’ procedure for 
equating on social desirability was successful in terms of the group 
as a whole, but was far from successful in eliminating individual 
estimations of social desirability, 


Discussion 
| The present Study was designed as an investigation of the effec- 
tiveness of the forced-choice technique in eliminating the effect 
of individual differences in social desirability of test items. As the 
Edwards Personal Preference Schedule is one of the most carefully 
constructed forced-choice tests, it was used as the vehicle for this 


study. Consequently, While the conclusions of this study are most 
directly applicable to the evaluations of the Edwards test, the writ- 


Ringwall, 1958; Edwards, Wright & Lunneberg, 1959) found cor- 
relations of .88 and .56, Tespectively, between relative frequency 
with which one group selected statements from Edwards pairs as 


not adequate. As indicated above, our data can be interpreted as 
indicating Edwards’ Procedure was successful in controlling for 


SALTZ, REECE AND AGER 369 


group social desirability. When summing social desirability choices 
across items for each need scale, almost no agreement among sub- 
jects was found as to the relative social desirability of the needs 
(average rho was .14). The apparent discrepancy between this study 
and the other two may be due in part to the units evaluated. It ap- 
pears from our results that failure to exactly equate statements 
within pairs for social desirability does not bias the need scores 
and that group desirability operates as random error at the trait 
score level. 

Thus, while the results suggest that the forced-choice technique 
may be successful in eliminating group standards of social desira- 
bility from the test, we are left with the problem of evaluating con- 
sequences of the individual social desirability which remains in the 
forced-choice test. 

Travers (1951) has suggested that forced-choice tests require the 
subjects to choose among alternatives none of which may be rele- 
vant for him. In defense of the forced-choice technique it is gen- 
erally assumed that, if all alternatives are inappropriate, subjects 
will choose randomly among them as they are of equal social de- 
sirability; this has the effect of increasing error variance but re- 
ducing “bias.” Results of the present study, however, suggest that 
when all alternatives are inappropriate (or equally appropriate) 
subjects may select the alternative which has the greatest personal 
social desirability. This may produce a bias in the form of a con- 
stant error throughout the test. 

Some writers (e.g., Gordon, 1951) have assumed a “projective” 
principle to operate in the forced-choice situation. Briefly, the ar- 
gument is that those alternatives perceived by the subject as more 
socially desirable tend to be those more like himself. One may ar- 
gue from this then that only group social desirability has a bias- 
ing effect; that individual social desirability is highly correlated 
with the trait in question. Even accepting this argument as reason- 
able, it would seem that such a mechanism would operate only for 
certain classes of traits. Clinicians point out that some types of 
traits are considered undesirable and are repressed or denied. To 
investigate this problem would require independent measures of the 
traits in question. 

It should be pointed out in addition that the present study has 
identified individual social desirability as a source of reliable var- 


370 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


iance. The relative contribution of individual social desirability to 
the total variance of a test given with the usual instructions was 


not estimated. Studies in progress using a factor analytic approach! . р 


should help answer this further question. 


REFERENCES 


Corah, N. L., Feldman, M. J., Cohen, I. S., Gruen, W., Meadow, 
A., and Ringwall, E. A. “Social Desirability as a Variable in the 
Edwards Personal Preference Schedule." Journal of Consulting 
Psychology, XXII (1958), 70-72. 

Cronbach, L. J. “Further Evidence on Response Sets and Test De- 
sign.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, X 
(1950), 3-31. 

Edwards, A. L. Manual for Edwards Personal Preference Schedule. 
New York: Psychological Corporation, 1953. 

Edwards, A. L. The Social Desirability Variable in Personality Re- 
search. New York: Dryden, 1957. 

Edwards, A. L., Wright, O. E., and Lunneborg, C. E. “A Note on 
‘Social Desirability as a Variable in the Edwards Personal Pref- - 
erence Schedule.’” Journal of Consulting Psychology, XXIII 
(1959), 558. 

Gordon, L. V. “Validities of the Forced-Choice and Questionnaire 
Methods of Personality Measurement.” Journal of Applied Psy- 
chology, XXXV (1951), 407-412. 

Messick, Samuel. “Dimensions of Social Desirability.” Journal of 
Consulting Psychology, XXIV (1960), 279-287. 

Travers, R. M. W. “A Critical Review of the Validity and Ration- 
ale of the Forced-Choice Technique." Psychological Bulletin, 
XLVIII (1951), 62-70. 


_ In a recent study, Messick (1960) has used the factor analytic approach to 
investigate the dimensionality of individual social desirability. Using three 
items from each of 14 need scales, nine factors were extracted, The dominant 
factor accounted for 42 per cent of the common factor variance. It is of interest 
to note that Messick’s factors did not correspond in any systematic way to the 
traits; whereas in the present study consistent individual differences in social 
desirability were found at the trait level. 


à Ebucariowan Амр Рзусно ос лі. MEASUREMENT 


Vor. XXII, No. 2, 1962 


AN INVESTIGATION OF RESPONSE SETS 
ON ALTERED PARALLEL FORMS 


GILBERT SAX 4x» ALBERT CARR 
University of Hawaii 


. Ix 1946 Cronbach identified a number of response sets or habits 
which tend to cause “the subject to earn a different score from what 
he would earn if the same items were presented in different form” 
(Cronbach, 1946). Cronbach indicated that the presence of response 
sets tends to lower test reliability and that their presence reduces 
validity coefficients inasmuch as they are irrelevant to the criterion 
being measured. 

At least two ways exist in which items may be presented to an 
examinee on aptitude and achievement tests. Traditionally, all of 
the items measuring the same subject matter were grouped to- 
gether to form separate subtests. However, in the spiral-omnibus 
form of organization different types of items are intermixed and 
placed in ascending order of difficulty. Increasingly, but without 
empirical justification, test publishers have been employing the 
spiral-omnibus arrangement on tests of intelligence and aptitude 
in preference to using separate subtests, and although a number of 
logical arguments for and against the use of the spiral-omnibus form 
have been made (Findley & Scates, 1946, p. 62; Bradfield & More- 
dock, 1957, p. 114), no empirical studies have been found justifying 
its use. It was believed that an analysis of the spiral-omnibus and 
subtest form of item arrangements would not only provide compara- 
tive data on the two forms of test organization but would also con- 
tribute an understanding of response sets if subjects respond differ- 
entially on the two forms. 


371 


372 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 4 
Procedure 


Freshmen in an introductory course in education (N: —335) at the 
University of Hawaii were given forms A and B of the Henmon- 
Nelson Tests of Mental Ability for College Students. Unaltered, 
these tests are arranged in spiral-omnibus form and contain vocabu- 
lary, mathematics, and spatial relationship items. For compara- 
tive purposes, however, one form of the Henmon-Nelson was cut 
and reorganized into separate subtests and then multilithed; on 
the other form the items were multilithed in the original spiral- 
omnibus organization.! 

To reduce practice effects, approximately half of the group took 
the spiral-omnibus form first followed by the subtest form, whereas 
the procedure was reversed for the other half of the group. The di- 
rections and time limits were the same for students taking either of 
the parallel forms. To reduce the effect of speed of performance on 
test results, time limits were increased to forty minutes, ten minutes 
beyond the time limit indicated in the test manual, Scores attained 
in this study are not therefore directly comparable to those scores 
attained by the standardization group. 


Results and Discussion 
Number of Items Attempted 


An examination of Table 1 indicates that the subjects tended to 
attempt more items on the spiral-omnibus form than on the sub- 
test form. With each subject as his own control, a t-ratio of 6.27 
was found between the number of items attempted on each of the 
two forms. This ratio is significant at the .001 level of confidence. 

In an effort to interpret the reasons for the discrepancy found in 


TABLE 1 


Means and Standard Deviations of the Number of Items Attem on the 
Spiral-Omnibus and Subtest Form of Item mer 


E a l 
Mean Standard Deviation 


Spiral-Omnibus 80.13 9.42 
Subtest 76.21 10.91 


| ا کے‎ 2. а= 


^, *Permission by the publishers to duplicat and alter the Henmon-Nelson 
T. Tests is gratefully acknowledged, saad i T 
Е 


SAX AND CARR 373 


the number of items attempted on the two forms, an analysis was 
made of the mathematies items. The analysis consisted of two 
phases: (1) a eount of the number of mathematies items omitted 
before the last item attempted; and (2) a count of the number of 
mathematies items omitted after the last item attempted. Because 
the items were originally equated as to difficulty level, no response 
set would be indicated if subjects responded in the same manner 
on the two forms. 

The results indicated that 596 mathematics items were omitted 
by the subjects on the spiral-omnibus form before the last at- 
tempted, whereas a total of 571 mathematies items were omitted 
by the same group on the subtest form before the last item at- 
tempted. A chi-square of .54 was obtained for the differences inythe 
total number of mathematics items omitted prior to the last item 
attempted on both forms of the test. With df — 1, this difference 
was not statistically significant. 

However, on the subtest form a total of 1172 mathematies items 
were omitted after the last item attempted, whereas only 1028 
mathematics items were omitted after the last item attempted on 
the spiral-omnibus form. With 1 df a chi-square of 9.4 was found, 
significant at the .01 level of confidence. Evidently, subjects tend 
to do equally well up to a point on both forms of test organiza- 
tion, but beyond that point, perhaps because the items become in- 
creasingly more difficult on both forms and because the subtest 
form of organization does not provide subjects with a variety of 
tasks as they move from one item to the next, subjects tend to 
give up more readily on subtests than where various types of other 
tasks are provided as in the spiral-omnibus form of organization. 
In any case, the presence of a response set seems evident inasmuch 
as subjects responded differentially on two originally parallel forms 
when one form was altered on format alone. 


Number Correct 


Subjects not only attempted more items on the spiral-omnib 
form than on the subtest form but also attained higher scores. Stu- 
dent’s t for 334 df was 13.68, significant at the .001 level of confi- 
dence. That these higher scores are not due entirely to-the greater 
number of items attempted is seen by comparing Table 1. to Table 
2. It may be hypothesized (but cannot be conclusively demonstrated’ 


374 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 


Standard Deviations of the Number of Items Correct on the 
mm Spiral-Omnibus and Subtest Arrangements 


Form Mean Standard Deviation ` 
Lad 
i 62 
iral-Omnibus 54.30 8. 
гаран 45.18 9.95 


by examining the raw data) that the presence of increasingly com- _ 
plex items in a subtest tends to diseourage students from respond- 
ing to the more difficult items, and, conversely, the presence of dif- 
ferent types of questions may provide some partial reinforcement 
and motivation to continue if the subject is able to respond cor- 
rectly, let us say, to à vocabulary item rather than having to face 
the prospect of additional mathematics items when he has already 
had difficulty with a number of them. 


Reliability 


The manual for the Henmon-Nelson tests reports an alternate 
form reliability coefficient of 89 with a standard deviation on each 
form of 10.2 for 171 freshmen. In the present investigation, how- 
ever, when the items are arranged in subtest form and correlated 
with the spiral-omnibus form, the correlations drop to .62 even 
though the standard deviations in the present sample approximate 
those reported for the standardization group (see Table 2). Con- 
verting the two reliability coefficients to 2 values and dividing these 
differences by the standard error of the differences between the two 
Z's yielded a ratio of 7.3, significant at the .001 level of confidence. 
This tends to substantiate Cronbach’s thesis that the presence of a 
response set reduces test reliability (Cronbach, 1946). es 

Kuder-Richardson reliabilities (Formula 20) were also computed 
on both forms of test arrangement. 'T'he spiral-omnibus form yielded 
a K-R reliability of 81, and the subtest form yielded a K-R of 
85. These differences were not statistically significant, However, 
measures of internal consistency are inflated on tests in which 
speed of response is an important variable. In an effort to com- 
pare the influence of speed of performance on test reliability on 
each of the two forms, Cronbach and Warrington's formula for es- 
timating the reliability of partially speeded tests (1950) was em- 


E 


| 


SAX AND CARR 375 


ployed. This formula yielded “corrected” reliability coefficients re- 
duced equally in size by .11. No evidence, therefore, was found on 
the differential effects of speed on the reliability coefficients. 


Validity = 
Both the spiral-omnibus and the subtest form of item arrange- 
ment were correlated with grade point average for 332 subjects 


for whom records could be found. The manual for the Henmon- 


Nelson Tests reports a validity coefficient of .60 between average 
grades for the first term of attendance and the Henmon-Nelson 
"Tests. In the present investigation, the spiral-omnibus form corre- 
lated .53 with first semester college grades, and the subtest form 
correlated .52 with first semester college grades. With N — 332, 
means and standard deviations on the spiral-omnibus form were 
54.62 and 8.40, respectively; means and standard deviations on the 
subtest form were 45.30 and 9.88, respectively. 

The correlations reported yield evidence as to the concurrent 
validities of the two forms. The differences in the size of the cor- 
relations were statistically insignificant. 


Summary 


Three hundred thirty-five freshmen in an introductory course in 
education at the University of Hawaii were given forms A and B 
of the Henmon-Nelson Tests of Mental Ability. These tests are ar- 
ranged in spiral-omnibus form and contain vocabulary, mathe- 
matics, and spatial relationship items. On one form of the Hen- 
mon-Nelson Tests, items were cut and reorganized into three 
separate subtests; on the other form the items were presented in 
the original spiral-omnibus form. 

Inasmuch as subjects attempted significantly more items and 
obtained significantly higher scores on the spiral-omnibus form 
than on the subtest form, evidence indicating the presence of a 
response set dependent upon test format and form of item presen- 
tation was indicated. At least on the mathematics items, subjects 
tended to eliminate a significantly larger number of items at the 
end of the subtest form than they did at the end of the spiral- 
omnibus form. Statistically significant differences were not found 
in internal consistency reliabilities nor were statistically signifi- 
cant differences found in the concurrent validity coefficients. Equiv- 


376 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


alent form reliabilities, however, were significantly reduced when 
items were rearranged in subtest forms. 


REFERENCES 


Bradfield, J. M. and Moredock, Н. S. Measurement and Evalua- 
tion in Education. New York: The Macmillan Co., 1957. 
Cronbach, L. J. “Response Sets and Test Validity." EDUCATIONAL 
AND PSYCHOLOGICAL MEASUREMENT, VI (1946), 475-494. 
Cronbach, L. J. and Warrington, W. G. “The Reliability of Speeded 
Tests.” Psychometrika, XV (1950), 259-269. 
Findley, W. G. and Scates, D. E. The Measurement of Understand- 
ing, Forty-Fifth Yearbook of the NSSE, Part 1. Chicago: The 
University of Chicago Press, 1946. 


È EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 2, 1962 


VALIDITY STUDIES SECTION 


ET Edited by 
WILLIAM B. MICHAEL 
University of Southern California 


M The Validity of the Objective-Analytic Personality Test Bat- 
є tery in Navy Settings. ROBERT R. KNAPP ..............: 

- Achievement, Aptitude, and Personality Measures as Pre- 
dictors of Success in Nursing Training. RUSSELL HANEY, 

WILLIAM B. MICHAEL, AND ARTHUR GERSHON .......... 

A Study of the Validity of the Pre-Engineering Ability Test. 
REGINALD L. JONES ее eS 

f Comparing Zero-Order Correlation from. SCAT Total and 
Multiple Correlation from SCAT Q and V at Southern Il- 

linois University. JOHN W. LEWIS ....«eeee et 

High School Record and College Board Scores as Predictors 

of Success in a. Liberal Arts Program During the Fresh- 

| man Year of College. WILLIAM B .MICHAEL, ROBERT A. 
| JONES, ANNA Cox, ARTHUR GERSHON, MARVIN Hoover, 
¥ KENNETH KATZ, AND DENNIS SMITH ............... 

1 Utilizing the Stepwise Multiple Regression Procedure in Se- 
є lecting Predictor Variables by Sez Group. Јонм W. Lewis 


379 


389 
393 


397 


399 
401 


« 
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
^ Vor. XXII, No. 2, 1962 


THE VALIDITY OF THE OBJECTIVE-ANALYTIO 
f PERSONALITY TEST BATTERY IN NAVY SETTINGS! 


l ROBERT R. KNAPP 


U.S. Navy Medical Neuropsychiatric Research Unit 
San Diego, California 


Background. The measurement of personality factors by objec- 

= tive tests, as defined by Cattell (1957), is a relatively recent 
achievement. In attempting to assess dimensions of neuroticism 

and psychoticism, Eysenck (1952) has reported success in differen- 
tiating neuroties and psychotics from normals using such tests. 
More recently, Cattell (1955) has presented the Objective-Analytic 
Personality Test Battery (O-A Battery) which yields scores on 18 

€ personality factors in the objective test domain. The group form 
of the O-A Battery consists of some 38 separate paper-and-pencil 
tests from which approximately 80 scores are derived. These test 
scores are then standardized and summed in such a way as to yield 
scores for the 18 objective personality factors. The first twelve of 

it these factors represent the best confirmed, that is, most clearly 
defined, factors. The remaining six factors are less clearly defined 


4 and are referred to as extension factors awaiting further confirma- 
tion in other factor analytic investigations. 

The reader is referred to Eysenck (1959) for a review of this bat- 
tery and to Sells (1959) for a review of the broader system into 
which this battery fits. 

& 


1 Based in part on a paper presented at the American Psychological Associa- 
tion, New York, September 4, 1961. 

The present data were collected under Bureau of Medicine and Surgery Proj- 
ect Numbers MR 005.14-2100 and MR 005.15-1001.1.3. The opinions expressed are 
those of the author and are not to be construed as being official or in any way 
representative of the United States Navy. The author wishes to acknowledge 
the assistance of William J. Moonan in computing the multiple regression 
analyses, " 


379 


380 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Problem. The O-A Battery has been administered experimentally 
to groups of Navy personnel to investigate its predictive and con- 
current validity against criteria of importance to the Navy. The 
studies considered here involved samples from two different Navy 
populations. The first sample was taken from enlisted personnel 
being processed for admission to the Navy Submarine School, New 
London, Connecticut. The second sample was taken from a group 
of qualified Marine Corps officer pilots who had been assigned to 
an operational helicopter group. 


Study I 


Subjects. This sample consisted of 315 Navy enlisted men com- 
pleting final assessment procedures before starting a course of in- 
struction for duty in the Submarine Service. The complete O-A 
Battery was administered during this processing period. 

Criteria. Two criteria were considered: (1) whether the indi- 
vidual completed the course or was dropped prior to graduation, 
and (2) the final standing in the class for those graduating. ‘Of 
the 315 subjects starting the course, 36 were dropped from the 
course of instruction. 

Results. Biserial correlations against the pass-drop criterion are 
presented in Table 1 as are product-moment correlations against 
class standing for those graduating. Three of the correlations be- 
tween O-A Battery factors and the pass-drop criterion were sig- 
nificant beyond the .05 level as were 11 correlations with the class 
standing criterion. 

It will be noted that the general magnitude of the validities for 
the class standing criterion range up to .27 and for the pass-drop 
criterion up to .31. These are only somewhat higher than validities 
characteristically obtained from standard personality inventories 
at the Submarine School. For example, in this sample the validities 
obtained from factors of the Guilford Inventory of Factors STDCR 
and the Guilford-Martin Personnel Inventory and Inventory of 
Factors GAMIN were as high as 21 against the class standing eri- 
terion and .29 against the pass-drop eriterion. As would be ex- 
pected, the validities obtained from the Navy Basie Test Battery 
against the class standing criterion were slightly higher. The Navy 
General Classification Test (GCT), a verbal aptitude test, corre- 
lated .29 with class standing but a simple sum of the Arithmetic 


UE ge 


381 


ROBERT R. KNAPP 


“PAST TOO’ O4 49 %птоутияс uaa. 
әлә] 10° I} 98 JUBII yy 


элә] GO’ әчү 38 4uvogruig „ 
"mo 
pejsnatd ‘LOD oy} Aq pornevour se 'oouosrojut [eq194 Jo s309go oq чуп Surpuejs seujo pue 810498} Ayeuosiod uoo442q =п01у5]ә:1200 SMOJE uurqoo tT OUI, ¢ 
"aeui st dnoJ3 esed eq, jo u'eeur əy} Jey} sojworput uorjoearp eAnmsod y g 
"чошәтшо үүвә Hows Gu) 20у со рив 
Bazu Duryea 199d oq, 10J 18 jo N ue uo poe?q ere suon v[o1100 'odures зоа әчү 104 "622 JO N uv UO poseq ore uonejuo Zurpuvjs ssv[o eu ysumw3e єпотувуәлзогу | 
Өе 90 — or sseumo( EE In 
*6l — *6l — 60 — uormAOIjUJ “BA Uoma ZE IQ 
0 20° 20 — wsqeoy rA TE In) 
TO: 80° 90° eouapuadopup 0g In 
20° a ax TS" SseueArsuodsa1r9A() ojerpouru] 6c IN 
«ST *61` c0 — "uepsy 8с IN 
0c c0 £0' (quee or xU IL Aqyedy 22 In 
«L6 — *бб` er 9r 00° 20: a Tonuoj)jueumjueg-JeS 9с IN 
тс’ di Gp 00° 60° +71” 0c* ,49uepueg onoqoAsq, “SA шецвәу SZ IN 
Tr. 496 — 0r — 80. — TO xT *«9б` Aarxuy + 10 
80 — Zi 2m Sr 60 — = qu «UISHTOMIN,, ‘SA SAIS [UihON| EZ IN 
aa TP" eo Ir cU — £0' 90° 60° smono ec In 
Sc 06 — S0 — xx08 — 461° xxLT Ar eouwreqnxq T6 IN 
To= FO: ifs то’ 40° xT S0: uonneqy "64 HORO) 06 f 
sv PEs T0 op - 34067 *x46 Sr A4ypeonowiq [HUD 61 In 
90: — SE = O0U— «аг zs so` ssoujreurg ommwurodAH 8I In 
TU SI — pps x96 — 80 £0° = чочо Zr In 
HL JA 70 — x08 — wel" **66. *©б` SSeueAressy он 91 IN 
TO ")deooy sayeng “Jorg гад E «Чот әти, 103083 хәрчү 
xolg pogs — exrT-1298JO од /— ———————————— sd esaun 
eSUIPUBIS SSL 
sSgurjey-10oq 
Bojeprpus;) poouog 
syorq 1eydooye 19090 euuvuqng рәш 
viuonng 789], fiaynuosiag ipuy 20110090 ay) шогу paurnjgg 821025 10300,1 392], figi]puosaoq 201100140 fo $970124107) UNAN) 
چ‎ I W'IHY, : E 
Ta ^ а & * ё v 


D p m — ue и p Cu————————i9" 7 53 


382 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(ARI) and Mechanical (MECH) scores yielded a validity of .40. 
Of course these validities are underestimates since they are not cor- 
rected for restriction due to selection. 

Since a few of the measures in the O-A Battery appear to paral- 
lel tests often used in ability measurement, it was felt that it would 
be of interest to partial out the effects of intelligence as estimated 
by the General Classification Test. These partial correlations are 
also presented in Table 1. Of the 11 significant correlations ob- 
tained between class standing and the objective personality factors, 
6 still reached significance with GCT partialed out. Thus, it ap- 
pears that certain of the O-A Battery factors are accounting for 
variance not measured by the General Classification Test, even 
though the correlations are of a relatively low magnitude. 

Multiple regression analyses were undertaken to determine what 
effect using combinations of the O-A Battery and the present Navy 
selection tests would have on the predictability of class standing. 
Tests currently used in selection for Submarine School are the Gen- 
eral Classification Test and a combination of the Arithmetic and 
Mechanical Tests. Table 2 presents the multiple correlations ob- 
tained against the class standing criterion. 

It can be seen from Table 2 that the multiple r for the entire 18 
factor O-A Battery was .45 as compared with a multiple of .41 for 
the Navy selection tests currently used. However, the latter coeffi- 
cient was not corrected for restriction due to selection, 

In predicting the drop-pass criterion, much the same general 
picture emerges. Biserial correlations, against the drop criterion, 
of .20 and .29 were obtained for GCT and ART + MECH, respec- 
tively. Significant biserial correlations from the O-A Battery of 


TABLE 2 


Correlations and Multiple Correlations of 
Tests Against Class Standing Criterion 


езана осорона 


Correlations and 
Tests Multiple Correlations 
(ARI 4- MECH) ‚40 
GCT .29 
18 O-A Battery Factors 45 
(ARI + MECH) + GCT AL 
(ARI + MECH) + GCT + 18 O-A Factors 


Á 
9 


Ы) 


|. Ӯ 


ex. 


ROBERT R. KNAPP 383 


.31 for UI 29, Immediate Over-responsiveness; .26 for UI 24, Anx- 
iety; and .23 for UI 16, Assertiveness were obtained. 

While the over-all prediction obtainable from the present non- 
cognitive battery is roughly equivalent to results obtainable from 
the Navy battery of cognitive tests, it can be seen that adding the 
present O-A Battery factors to the Navy selection tests does not 
markedly increase predictability, the multiple correlation being 
.51. Thus, it appears doubtful that the increase in validity obtained 
with the present objective test factors would justify their use in 
predicting success in Submarine School. 

However, it is of interest to examine what personality charac- 
teristics, as measured by the O-A Battery, are associated with suc- 
cess in this present training situation. If one were to consider only 
those variables where the .01 confidence level was attained against 
at least one of these criteria, five of the O-A Battery factors would 
appear to be important to success in Submarine School training. 
These factors in Cattell's (1957) Universal Index System are UI 
16, UI 19, UI 21, UI 24, and UI 29. High scores on each of these 
objective test factors are associated with greater success in the pres- 
ent training situation and in three instances (UI 16, UI 24, and UI 
29) significance was obtained against both of the criteria of success. 
A review was made of the interpretations given to these factors so 
that a brief summary could be given of the characteristics asso- 
ciated with successful completion of Submarine School. To aid in 
this, the correlations between class standing and individual tests 
contributing to the measurement of these factors were also exam- 
ined. From these relationships the successful candidate is depicted 
as being (from UI 16) more determined (or assertive) and effective 
in his actions, not easily upset, high in perceptual speed, and per- 
haps perceptual fluency; and, (from UI 19) more critical, careful 
and exact—more “correct”—in his test performances. Further, 
(from UI 21) he is depicted as being more exuberant, faster in his 
test performances; and, (from UI 24) more anxious (which may 
be interpreted as an "anxiety to achieve"), more willing to admit 
his weaknesses; and, (from UI 29) easily able to mobilize his en- 
ergy to meet the demands of a situation and then to relax quiekly 
— more flexible. 


384 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Study II 


Subjects. The sample consisted of 81 Marine Corps officer pilots 
assigned to an operational helicopter air group. For this sample 
only the first 12 O-A Battery factors were considered. 

Criteria. The criterion measures were obtained several weeks 
after the testing sessions. Three of these criteria were peer ratings 
of Pilot Proficiency, Officer-Like Qualities and Social Acceptability. 
The peer nomination technique was used with subjects asked to 
nominate the highest 25 per cent and the lowest 25 per cent of the 
pilots in their squadron on each characteristic. The instructions 
and trait definitions for use in the peer nominations were given to 
the participating subjects on separate forms and are presented be- 
low. The appropriate N for each squadron appeared where brack- 
eted percentages are shown below. 


Pilot Proficiency. "From this list of names of pilots in your squad- 
ron, we would like you to pick out (a) the [25%] whom you con- 
sider the most proficient pilots (considering all the factors that 
are generally thought of as going to make up a ‘professional’ naval 
aviator), and (b) the [25%] whom you would rank as the least 
proficient of this particular group. You need indicate no preference 
or rank within these two groups of [25%]. Your answers will be 
confidential, and you need not sign your name, Please note that 
this ranking system applies only to the group of people named 
here. You are not necessarily implying that those you picked as 
‘most proficient’ are the best you’ve ever flown with—nor that those 
whom you picked ‘least proficient’ have any deficiencies at all. You 
are merely indicating their comparative standing, in your opinion, 
within this group.” 

Officer-Like Qualities. “Next, rank the members of this group, in 
the same way, as to the presence of those qualities which are or- 
dinarily thought of as ‘officer-like qualities.’ In other words, pick 
the [25%] who are (a) considered by you to be highest in military 
proficiency and officer-like qualities, and (b) the [25%] who 
would rank lowest, within this group, in the same qualities.” 
Social Acceptability. “Lastly rank the members of this group in a 
similar manner, as to: (а) the [25%] who seem to ‘fit in’ best with 
the squadron as a whole, in other words those who contribute the 
most, by reason of their personality and general disposition, to 


°ў 


"us ws ee 


Б,” 
n 
р ` 
QA 
pe 
l 
i 
E 
4 
і 
# 


КОВЕВТ Е. КМАРР 385 


harmony and good feeling; and (b) the [2596] whom you con- 
sider to rank the lowest in this same category. It may be of help 
if you would try to pick these men as though you were expressing 
preference among them, as to those whom you would most like to 
have as a companion on an extended cruise (purely from the stand- 
point of personality and social acceptability) and those whom you 
would least desire under the same circumstances.” 


Results. The correlations between the objective test factors and 
the peer ratings ranged as high as .39; the negative correlation 
— 39 was obtained between UI 16 and peer ratings of Pilot Pro- 
ficiency. The factors which generated significant validity and which 
thus appear to be of greatest interest in terms of the peer rating 
criteria are UI 16, UI 17, UI 21, UI 24, and UI 26. A review of the 
factor interpretations and tests composing these factors yielding 
significant validity suggests the following characteristics of the 
more proficient helicopter pilot. He tends (from UL 16) to be 
slower in his test performances, displays lower perceptual speed 
and fluency, and does not profess an interest in “highbrow” tastes. 
Among his peers he is the less assertive individual who places rel- 
atively greater emphasis on institutional rather than personal val- 
lues. From UI 17 he is seen to be less suggestible and less apt to 
indulge in carping criticism or mischievous humor. UI 21 depicts 
the proficient pilot as being less exuberant. Those rated high on 
social acceptability tend to be (from UI 24) the less anxious indi- 
viduals who do not report being irritated by many common an- 
noyances and are those who (from UI 26) display a higher degree 
of self-sentiment control. 

A fourth criterion, an index of frequency of sick call attendance, 
was also obtained and the results considered elsewhere (Knapp, 
1961). A high rate of sick call attendance tended to be associated 
with lower peer ratings, the correlations being —.39, —.28 and 
—.36 ratings of Pilot Proficiency, Officer-like Qualities, and So- 
cial Acceptability, respectively. In line with these findings, the fac- 
tor (UI 16) having the highest negative correlation with the rat- 
ings (r = —.39) also had the highest correlation with sick call 
frequency (r = 51). 

Inspection of the validities obtained against the various eriteria 
suggests that those criteria characterized by the most objective defi- 


386 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


nitions yielded the highest validities. One might speculate that as 
the objectivity of the criterion increases the magnitude of the high- 
est obtained validity also increases. Although it is perhaps difficult 
to assess the relative objectivity of the three peer rating criteria, 
indirect evidence suggests that ratings of Pilot Proficiency would 
be most amenable to objective assessment. This evidence includes 
the correlation between peer ratings and commanding officers’ rat- 
ing on the three traits. For the Pilot Proficiency criterion the aver- 
age correlation, for the five squadrons, between commanding of- 
ficers’ ratings and peer ratings was .70. However, for both the 
Officer-like Qualities and Social Acceptability ratings, average cor- 
relations of about .50 were obtained, thus suggesting a lesser de- 
gree of objectivity for these latter two criteria. Further evidence 
that higher validities are associated with the more objectively as- 
sessed criteria is suggested by examining correlations against fre- 
quency of sick call visits, the most objective criterion. Here corre- 
lations with O-A Battery factors were as high as .57. 

In this sample of Marine Corps pilots, the correlations between 
objective test factors and the criteria are markedly higher than 
those obtained from questionnaires. The Minnesota Multiphasic 
Personality Inventory and the Guilford-Zimmerman Temperament 
Survey had also been administered (Knapp & Most, 1960). The 
MMPI seales yielded only three significant correlations against the 
four criteria under consideration (the highest r being .36 between 
the Mf scale and sick call frequency) and the Guilford-Zimmerman 
traits yielded none. Thus, in this setting the measurement of per- 
sonality traits through objective tests would appear to be particu- 
larly promising. 

Conclusions. Although the present findings were obtained on com- 
paratively small samples and omitted the nicety of cross-valida- 
tion, it is concluded that the results in predicting certain of the 
criteria are promising. In every instance prediction from the O-A 
Battery was as good as or better than that obtained from the per- 
sonality questionnaires considered, when prediction from single 
O-A factors was compared with prediction from single question- 
naire scales. Further refinement of the O-A Battery is currently 


underway and this may add substantially to the validity obtain- 
able from objective test factors, 


» 


ROBERT R. KNAPP 387 


REFERENCES 


Cattell, R. B. Handbook for the Objective-Analytic Personality 
Test Battery. Champaign, Ill.: Institute for Personality and 
Ability Testing, 1955. 

Cattell, R. B. Personality and Motivation Structure and Measure- 
ment. New York: World Book Company, 1957. 

Cattell, R. B. “A Universal Index for Psychological Factors.” Psy- 
chologia, X (1957), 74-85. 

Bysenck, H. J. The Scientific Study of Personality. London: Rout- 
ledge & Kegan Paul, 1952. 

Eysenck, H. J. “Objective-Analytic Personality Test Batteries.” In 
O. K. Buros (Editor), The Fifth Mental Measurements Year- 
book. Highland Park, New Jersey: Gryphon Press, 1959, pp. 
170-172. 

Knapp, R. R. “Objective Personality Test and Sociometric Corre- 
lates of Frequency of Sick Bay Visits.” Journal of Applied Psy- 
chology, VL (1961), 104-110. 

Knapp, В. В. and Most, J. A. “Personality Correlates of Marine 
Corps Helicopter Pilot Performance.” USN Medical Field Re- 
search Laboratory Report (1960), No. HR 005.15-1001.1.3. 

Sells, S. B. “Structured Measurement of Personality and Motiva- 
tion: A Review of Contributions of Raymond B. Cattell.” Jour- 
nal of Clinical Psychology, XV (1959), 3-21. 


ЛЕД 


M » P А eu. е 


EDUCATIONAL AND PSYCHOLOGICAL MEASU 
Vor. XXII, No. 2, 1962 wx 


ACHIEVEMENT, APTITUDE, AND PERSONALITY 
MEASURES AS PREDICTORS OF SUCCESS 
IN NURSING TRAINING 


RUSSELL HANEY 
Los Angeles, California. 


WILLIAM B. MICHAEL axo ARTHUR GERSHON 
University of Southern California 


Problem. For a sample of 82 freshmen trainees in student nurs- 
ing at the Los Angeles County Hospital, it was the major purpose 
of the investigation to determine estimates of the degree of pre- 
dictive validity of scores for two achievement test measures (Cali- 
fornia Reading Test and California Mathematics Test), for five 
parts of the Employee Aptitude Survey Tests (EAST 8, 4, 5, 9, 
and 10), and for the 12 scales of the MMPI, as well as indices of 
the predictive value of grade point averages in all high school aca- 
demie courses and in two semesters of high school chemistry alone, 
with respect to each of five criterion measures that include grades 
in eourses of anatomy and physiology, microbiology, medical and 
surgical practice (two parts), and class rank in performance in the 
hospital ward on each of 19 items—a rank order based on inde- 
pendent judgments of three observers. A secondary purpose was 
to obtain additional cross-validational data of speeded perceptual 
and spatial tests that in two previously reported studies (Haney, 
Michael & Jones, 1959; Michael, Jones & Haney, 1959) yielded 
several significantly negative validities, but not in a subsequent 
investigation (Haney et al., 1960). All variables are enumerated 
in Table 1. (The nineteen variables upon which students were 
ranked pertained to characteristics involving patient care, tech- 


389 


390 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 


Predictive Validity Coefficients of Twenty-Three Tests and Two High School Measures 
of Achievement with Respect to Grades in Four Courses and Ward Rating Along 
with Intercorrelation of These Five Criterion Measures for the Fall, 1961, 
Class of Los Angeles County General Hospital 
(N = 82) 


Validity Coefficients of Predictors* 
Along with Intercorrelations of 


Criterion Measures 
Predictors and Criterion Measures (26) (27) (28) (29) (30)f 


1. Calif. Reading Test (Total) do. 255-6 210 999) ту 
2. Calif. Reading Test (Vocabulary) 34. 18. X7 24 13 
3. Calif. Reading Test (Comprehension) 21 31 25 22 04 
4. Calif. Math Test (Total) 20 26 20 26 12 
5. Calif. Math Test (Reasoning) 22 34 31 34 19 
6. Calif. Math Test (Fundamentals) 17 17 13 14 02 
7. EAST No. 3—Visual Pursuit 30 —17 —14 -04 —09 
8. EAST No. 4—Visual Speed and 
Accuracy 22 05 07 10 06 
9. EAST No. 5—Space Visualization 35 12 05 16 09 
10. EAST No. 9—Manual Speed and 
Aecuracy —02 —15 -15 -12 -03 
11. EAST No. 10—Symbolic Reasoning 15 02 02 15 04 
12. MMPI—L Scale -01 —17 -17 -22 -0 
13. MMPI—F Scale 01 00 00 —13 —14 
14. MMPI—K Scale 01 —06 —05 -11 —08 
15. MMPI—Hs + .5K Scale -13 —27 —94 —90 -31 
16. MMPI—D Seale 02 03 0з -03 05 
17. MMPI—Hy Scale —04 —07 -03 -16 —08 
18. MMPI—Pd + .4K Scale —28 —21 —24 -19 —17 
19. MMPI—Mf Scale —10 —10 —06 06 -08 
20. MMPI—Pa Scale —09 —09 —07 -09 —10 
21. MMPI—Pt + 1K Scale 16 09 16 04 04 
22. MMPI—Sc + 1K Scale 07 00 04 —00 —04 
23. MMPI—Ma + .2K Scale -07 -16 —09 -07 —13 
24. High School Grade Point Average 29 35 36 37 14 
25. High School Chemistry 
(Two Semester GPA) 35 43 38 22 2 
26. Anatomy and Physiology (Grades) — 61 00 57 30 
27. Microbiology 61 —— 80 03 4 
28. Medical-Surgical Practice IT 60 80 —— 71 5 
29. Medical-Surgical Practice I OL DS un aro INT] 
30. Ward Performance Scale 30 а ШЕ а: сс. 


F Te à ж-з 
ud VUES for restriction of Tange, coefficients of .22 and .28 are significant at the .05 


T The values are based on the Tnedian correlation coefficient of each of the 

1-25) and of the f. i s : predietor variables 
a 0) Es Lo e four criterion measures (20-29) with the 15 items of the Rating Scale of Ward 
niques of medication, and personal qualities such as judgment, ini- 


tiative, dependability, and planning.) 
Statistical Treatment. Through use of the IBM 7090 correlation 


HANEY, MICHAEL AND GERSHON 391 


program at the Western Data Processing Center at U.C.L.A.,! 
product moment correlations were calculated among the first 29 
variables cited in Table 1 as well as on the 19 items of the Ward 
Performance Scale. Since the students were ranked in each of the 
19 items in small class units of 13 to 20 students, a normalized- 
rank method permitting conversion to stanine scores for unequal 
numbers was employed as described in Guilford (1954, pp. 181— 
183). In view of the coarseness of the data and the existence of a 
marked degree of restriction in range in scores upon the first six 
variables (since students were seleeted at or above the 50th cen- 
tile on total scores in Reading and Mathematics), as well as in 
view of the associated restriction on measures (involving inciden- 
tal selection) correlated with these two tests, the correlation coeffi- 
cients may be considered as minimal estimates of predictive 
validity. 

Results, The following principal findings may be summarized: 
1. As in the three previous studies cited, achievement tests in read- 
ing and mathematics are significantly related to formal course 
work but not to standing in measured characteristics of ward 
effectiveness. 

2. Consistent with the finding in the last reported investigation by 
Haney and others (1960), the speeded tests in visual and percep- 
tual activities as measured by EAST (3, 4, 5, 9) are not signifi- 
cantly related in a negative manner to course grades in nursing 
training or to performance in the ward, although the tests 3, 4, 
and 5 of the EAST show significant positive validities with grades 
in Anatomy and Physiology. 

3. Of interest is the fact that the Hs and Pd scales of the MMPI 
register statistically significant negative validities, although cross- 
validational studies are clearly indicated. Noteworthy indeed is the 
fact that, with respect to every one of the 19 items on the Ward 
Performance Seale, a negative validity coefficient for the Hs scale 
significant beyond the .01 level was found (although none of those 
values is reported in Table 1). 

4. Chemistry grades alone in high school are slightly more pre- 


1 Appreciation is expressed to the Western Data Processing Center at the 
University of California at Los Angeles for making time available to the writers 
under its cooperative plan of institutional participation in non-profit research 
activities. 


392 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


dictive of grades in the four training courses than is over-all high 
school grade point average (although all coefficients but one are 
significant beyond the .01 level). 

5. The presence of relatively high intercorrelations among the var- 
ious pairs of grade sets in the didactic, practical, and ward activi- 
ties suggests not only that a certain amount of communication may 
take place among the instructors and Supervisors to bring about 
consistency in the evaluation of students in the courses, but also that 
an underlying achievement-motivational pattern may be operative. 
6. Although not reported in Table 1, the existence of a median cor- 
relation coefficient of .79 in the set of 171 intercorrelations of the 19 
items on the Ward Performance Seale, with only 19 coefficients 
falling below .70 in a Tange of coefficients between 58 and .91, 
suggests that an over-all or global rating of ward effectiveness may 
well be as effective an indicator of ward success 88 a score based 
on summation of rank-order Scores on each of several character- 
istics. The existence of an apparent tendency for a student rank- 
ing high (or low) in the characteristics to be ranked high (or low) 
in all other items probably represents the presence of a “halo” ef- 
fect in the judgments of supervisors in the ward, 


REFERENCES 

Guilford, J. P. Psychometric M. ethods (Second Edition). New York: 
McGraw-Hill Book Company, 1954. 

Haney, Russell, Michael, William B., and Jones, Robert A. “Td 


; William B., Jones, Robert À., and Gad- 
18, L. Wesley. “Cognitive and Non-Cognitive Predictors of 


CATIONAL AND PSYCHO- 
‚ LOGICAL MEASUREMENT, XIX (1960), 387—389. 
Michael, William B; Jones, Robert A. 


Student Nurses." EDUCATIONAL 
MENT, XIX (1959), 641-643. 


х 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREME 
Vor. XXII, No. 2, 1962 i 


A STUDY OF THE VALIDITY OF THE 
PRE-ENGINEERING ABILITY TEST 


REGINALD L. JONES 


Miami University 


Background. The Pre-Engineering Ability Test (PEAT) (Edu- 
cational Testing Service, 1952) is a paper-and-pencil test designed 
to predict performance in engineering and pre-engineering pro- 
grams. An outgrowth of the substantially longer Pre-Engineering 
Inventory (PEI), the PEAT is often included in a pre-admission 
battery of tests used for counseling and guidance purposes. While 
the validity of the PEI for these purposes has been established 
elsewhere (Lord, et al., 1950), no validity studies of the PEAT 
are available—save one study of the test’s estimated validity re- 
ported in the manual. This study presents additional data on the 
PEAT's validity. 

Inasmueh as the ultimate objective of the research was to 
strengthen the selection battery of the pre-engineering program of 
the University, it seemed desirable to include as part of the analysis 
the one remaining test taken by the subjects (The American Col- 
lege Test, ACT (SRA, 1960), a college aptitude-achievement type 
test). The inclusion of both tests has thus made possible a study 
of their effectiveness, both comparatively and in combination, in 
predicting first semester grade-point average and grades in selected 
pre-engineering-relevant courses. 

Statistical Procedures. Computation of zero-order correlation 
coefficients and of multiple correlation coefficients along with £ 
tests of the significance of differences between correlated correla- 
tion coefficients have constituted the statistical techniques employed 
to meet the objectives of the study. 


393 


394 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Subjects. The 68 male subjects represented the population of 
pre-engineering students at a medium-sized midwestern state uni- 
versity for whom complete data were ayailable. Foreign students 
(N = 2), students failing to complete the first semester (N = 4), 
and students with incomplete test scores or grades (N = 16) were 
excluded from the analyses. 

Predictor Instruments. The ACT, a recently developed group 
paper-and-pencil test, is designed to provide measures predictive 
of academic success in college and measures indicative of educa- 
tional development. The test battery includes tests in the follow- 
ing four areas (Spearman-Brown odd-even reliabilities in paren- 
theses): English (.88), mathematics (.88), social studies (.85), and 
natural sciences (.83). In addition, the test yields a composite 
score (.95). The subjects took this test in the spring preceding their 
freshman year of college. 

The PEAT, also a group paper-and-pencil test, contains two sec- 
tions: 1) comprehension of scientific materials, and 2) general 
mathematical ability. A total score is also obtained. This test was 
administered in the fall preceding the freshman year of college. 

Criteria. First semester grade-point average, and first semester 
grades in Freshman chemistry and Freshman mathematies were 
the criteria of the.study, Both mathematics and chemistry grades 
were obtained in courses taught by several instructors and hence 
are subject to the usual shortcomings of data of this type. 

Inasmuch as the instructors were unaware of the status of the 
subjects (pre-engineering students are believed to be a cut above 
the average) and were, in addition, unaware of the test scores of 
these students, it is believed that criterion contamination was min- 
imal. No evidence on the reliability of the criteria is available. 

Results and Discussion. Means, standard deviations, and corre- 
lations between predictors and criteria are reported in Table 1. 
The data in this table show that both the ACT and the PEAT 
have moderately high correlations with all criteria, although the 
ACT or one of its subtests seems somewhat more effective as a 
predictor in two of the three comparisons, However, the zero order 
correlations between the PEAT and the criteria were not signifi- 
cantly different from the zero order correlations between the ACT 
subtests and the criteria when these differences have been eval- 
uated by the t test for differences between correlated correla- 


REGINALD L. JONES 395 


TABLE 1 


Means, Standard Deviations, and Zero Order Correlations 
Between Predictors and Criteria 


ACT 


Math- Social Natural 
PEAT English ematics Studies Sciences Composite 


Mean! 47.84 20.78 27.13 22.76 25.72 24.21 
S.D. 10.77 3.74 4.58 4.58 4.23 3.11 
Correlation of 
PEAT and ACT 
Scores with: 
1st Semester 
GPA! .6144 .466 .537 .508 .478 .645 
Freshman 
Chemistry? .637 .531 .600 ‚720 
Freshman 
Mathematics? .523 471 ‚614 .426 


1N = 68. 
2N = 51. 
aN = 39. 
4 р € .01 for all correlations. 
tions (MeNemar, 1955). These results are summarized in Table 2. 
The ACT composite score and the PEAT score were combined 
by means of the Wherry-Doolittle Test Selection Method (Garrett, 


1953) to maximize the prediction of first semester grade-point aver- 


TABLE 2 


Significance Tests of Differences in Correlations Between the PEAT and Criteria 
and the ACT and Criteria 


Correlation Between 
Comparisons N Predictors t p* 


PEAT-Mathematies Grade 
39 ‚617 ‚817 N.S. 


vs. 
ACT-NS-Mathematics Grade 
PEAT-Chemistry Grade 

vs. 
ACT-C-Chemistry Grade 


РЕАТ-18% Semester 


Grade-Point, Average 
68 .756 1.60 N.S. 


vs. 

ACT-Composite-1st Semester 
Grade-Point Average 

a 0‏ ا 


* p < .05 is adopted as a critical value. 


396 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


age. Although the PEAT entered the optimal battery, its presence 
increased the shrunken R from only .638 to .651. A multiple R of 
.673 was obtained. 


Summary. This research was designed to answer two questions: 
a) Is the PEAT valid as a predictor of performance in a pre- 
engineering program, and b) is it more efficient as a predictor of 
the criteria than a general test of scholastic aptitude (ACT). 

Analyses of the data showed the PEAT to be valid as a predictor 
of first semester grade-point average in the pre-engineering cur- 
riculum (p < .01) as well as predictive of grades in selected pre- 
engineering relevant courses (p < .01). In addition, the PEAT 


predicted the above criteria as well as a general test of scholastic 
aptitude. 


REFERENCES 


Educational Testing Service. Pre-Engineering Ability Test Manual. 
Princeton: Educational Testing Service, 1952. 

Garrett, H. E. Statistics in Psychology and Education. New York: 
Longmans, Green and Co., 1958. 

Lord, F., Cowles, J. T., and Cynamon, M. “The Pre-Engineering 
Inyentory as a Predictor of Success in Engineering Colleges.” 
Journal of Applied Psychology, XXXIV (1950), 30-39. 

McNemar, Q. Psychological Statistics. New York: John Wiley and 
Sons, 1955. 

American College Testin 


1 1 g Program. Technical Report 1960-61 Edi- 
tion. Chicago: Scien 


ce Research Associates, 1960. 


3 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 2, 1962 


COMPARING ZERO-ORDER CORRELATION FROM SCAT 
TOTAL AND MULTIPLE CORRELATION FROM SCAT Q 
AND V AT SOUTHERN ILLINOIS UNIVERSITY 


JOHN W. LEWIS 
Southern Illinois University? 


Problem. Bennett and Wesman (1955) and Juola (1961) have 
reported that a single-order regression equation derived from the 
summed total of the College Qualifying Test will predict college 
success as accurately as the multiple regression equation derived 
from the CQT subtests. 

The purpose of the present study was to test the null hypothesis 
between the zero-order correlation coefficient derived from corre- 
lating SCAT total score with college suecess and the multiple cor- 
relation derived from the multiple regression equation of SCAT 
Verbal, SCAT Quantitative, and college achievement. The .05 
level of confidence will be the level of acceptability for statistical 
significance. 

Criterion. The criterion was first quarter total grade point aver- 
age at Southern Illinois University (А = 5, B = 4, C = 3, D = 
2; E — 1). 

Sample. The sample was 1,998 men and women who entered 
Southern Illinois University in the fall quarter of 1960 as first 
term freshmen. 

Results. The results presented in Table 1 read as follows. The 
zero-order correlation for SCAT total with the criterion was .46 
and accounted for .21 of the criterion variance. The multiple cor- 
relation of SCAT Verbal, SCAT Quantitative, and the criterion 


1 The writer is presently a graduate assistant at the State University of Iowa 
Examination Service. 


397 


398 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 


Beta Weights, Multiple Correlation for SCAT V and Q (Variables 1 and 2), Variance 
Accounted for by Multiple Correlation, Zero-Order Correlation Between 
SCAT T (Variable 3) and Criterion (c), and Criterion Variance 


Explained by Zero-Order Correlation. 
N = 1,998 
Beta Ra Ва? Tea Ta? 
SCAT V .33 
50* .25 46* 21 
SCAT Q .32 


was .50 and accounted for 25 of the criterion variance. The null 
hypothesis between the multiple coefficient and the zero-order co- 
efficient was not rejected at the .05 level. 

The multiple correlation coefficient was not corrected for shrink- 
age. Consequently, cross-validation procedures would probably re- 
duce the difference between the zero-order correlation and the mul- 
tiple correlation. 

In conclusion, the multiple correlation did account for more cri- 
terion variance than did the zero-order correlation. But the lack 
of rejection for the null hypothesis between the zero-order corre- 
lation and the multiple correlation, and the possibility of shrink- 
age for the multiple correlation coefficient tends to minimize the 


possibility of any statistically true difference existing between the 
two coefficients. 


REFERENCES 


Juola, Arvo, “The Differential Validity of the College Qualification 
Tests for Diverse Curricular Groups.” Personnel and Guidance 
Journal, XXXIX. (May, 1961), 722. 

Wesman, Alexander and Bennett, George, "Multiple Regression vs. 
Simple Addition of Scores in Prediction of College Grades." 
PR AND PSYCHOLOGICAL MEASUREMENT, XIX (1959), 


E. 
Е 

1 

1 


EDUCATIONAL AND PSYCHOLOGICAL Mtas 
Vor. XXII, No. 2, 1962 Vcn 


HIGH SCHOOL RECORD AND COLLEGE BOARD SCORES 
AS PREDICTORS OF SUCCESS IN A LIBERAL ARTS 
PROGRAM DURING THE FRESHMAN YEAR OF COLLEGE 


WILLIAM B. MICHAEL, ROBERT A. JONES, ANNA COX, 
ARTHUR GERSHON, MARVIN HOOVER, KENNETH KATZ, 
AND DENNIS SMITH 


University of Southern California. 


Problem and Subjects. It was the purpose of the study to deter- 
mine the predietive validity of high school grade point average, 
verbal scores, mathematics scores, and total (unweighted) scores 
of the Scholastic Aptitude Test of the College Entrance Examina- 
tion Board (CEEB), both individually and collectively, relative 
to a criterion of grade point average earned by 209 men and 233 
women during their 1960-61 freshman year in the College of Let- 
ters, Arts, and Sciences at the University of Southern California. 
Only those students were included in the sample who completed a 
total of 24 or more units of academic work during their entire first 
year and no fewer than 11 units during a given semester. 

Procedure. A correlational and multiple regression analysis was 
effected for each sex group through use of the IBM 7090 computer 
at the Western Data Processing Center at the University of Cali- 
fornia at Los Angeles. The statistical data are summarized in 
Table 1. 

Findings. An examination of numerical entries in Table 1 re- 
veals the following information: (1) For both sexes high school 
grade point average is more predictive of success in college than 
either part scores or total scores of the CEEB. (2) A least square 
linear combination of high school GPA and CEEB total scores or 
of high school GPA and differentially weighted verbal and quan- 
titative CEEB scores yields a higher predictive validity (multiple 


399 


400 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 


A 
Intercorrelations of College and High School Grade Point Averages (GPA), CEEB Verbal, E 
Mathematics, and Total (Unweighted) Scores along with Pertinent Multiple Correlation 
Coefficients for 209 Men (Entries Below Diagonal) and 233 Women (Entries Above 
Diagonal) Who Entered the Liberal Arts College of the University of Southern 
California as Freshmen During September, 1960 


Indexes of Multiple 


Correlation 
Variable (1) (2) (3) (4) (5) Coefficient Men Women ¥ 
3 
1. College GPA (2 Semesters) — 82 41 20 36 Riu .28 41 ' | 
2. High School GPA 40 — 20 29 52 Ris E .56 
3. CEEB— Verbal Score 25 27 — 42 — Risu .44 .61 
4. CEEB—Mathematics Score 23 20 41 — — All coefficients significant 
5. CEEB—Total Score 28 28 — — — beyond the .01 level and all 
(Unweighted) differences between corre- 
sponding multiple coeffi- > 
cients for the two sex 
groups similarly significant. 
Note. All di 


ecimal points omitted from zero order coefficients, 


each one of which is significant beyond h 
the .01 level. 


correlation coefficient) than does any one predictor. 
men and women the unreported beta weights associated with high 
school GPA are approximately twice as great as those assigned to 
the CEEB total score variable.) (3) The achievement of women EN 


in the liberal arts college studies can be predicted with greater ac- 
curacy than that of men. 


(For both 


ф 


, 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 2, 1962 


UTILIZING THE STEPWISE MULTIPLE REGRESSION 
PROCEDURE IN SELECTING PREDICTOR VARIABLES 
BY SEX GROUP 


JOHN W. LEWIS 
State University of Iowa 


Problem. The problem was to select from a battery of eleven 
predietor variables those variables which will yield the optimum 
estimate of first quarter grade point average by sex group at South- 
ern Illinois University. The eleven predictor variables were: rank 
in high school graduating class; School and College Abilities Test 
(SCAT), Form 1-A, Verbal, Numerical, and Total score; Co- 
operative English, Form Z, Higher Level, English Grammar, Punc- 
tuation, Spelling, English Total, Reading Vocabulary and Read- 
ing Comprehension; and the Illinois Mathematics Placement, 
Form AA, Total. The criterion of first quarter grade point average 
was based on the five point system: A-5, B-4, C-3, D-2, E-1. 

Sample. The sample consisted of 1,158 men and 840 women who 
entered Southern Illinois University in the fall quarter of 1960 as 
first-term freshmen. 

Procedure. All variables were reproduced from original sources 
to a final deck of 80-space IBM cards. The statistical method was 
multiple correlational analysis by the IBM 650 Stepwise Multiple 
Regression Equation. In the stepwise procedure, the beta weight 
for each variable was tested against zero by the F test of variance 
at the .05 level. If the beta weight for the variable was not signifi- 
cant at the .05 level, the variable was not allowed to enter the mul- 
tiple regression equation. 

Results. Table 1 presents the matrices of zero order correlations 
coefficients. 


401 


402 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
"Tu E S ea 


Matrices ofZeto Order Correlation. Coefficients 


Test SCAT SCAT Eng. Punct. Sp. Eng. В. R. Il. H. S.R. GE 
Variables Q. Т. G. T. Voc. Comp. Math. 
Males, N = 1,158 
SCAT V. 59 ОБГ ТО ag Oke "у А ce :32 .23 .35* 
SCAT Q. :88 1.45  .44 «85  .52 ..30 EY .94* 
SCAT T. 4161. COON AS — 764 1787 eget 23 .A1* 
Eng. G. 55 55 .85  .49 58 42 .29 .34* 
Punct. 35 .86 .31 48 41 .28 .25* 
Eng. Sp. -67 ٠ .44 39 29 .27 .24* 
Eng. T. .48 60 4T E .84* 
Rdg. Voc. 68 27 .19 .30* 
Rdg. Comp. 35 .25 .30* 
Ill. Math .38 .33 
H. S. R. .32* 
Females, N — 840 f 
SCAT V. 54 88 OE 47 47 61 84 71 40 30 .50 
SCATQ .87 52  .b0 .43 758 45 ..54 .64 .43 .49* 
SCAT T. :85, 66°62 1:89. REE rg ёр .42 .56* 
Eng. G. ;U8 FOR URS AUS. eno Ду E .43 .52* 
Punct. 45 .88  .40 48 34 27 .37* 
Eng. Sp. .45 .46 41 27 38 .42* 
Eng. T. .54 57 40 41 .51* 
Rdg. Voc. 70 33 20 .42* 
Rdg. Comp. 40 31 .AT* 
Ill. Math. 36 -38 
Н. 8. R. „47% 


* Bignificantly different between the .01 and ‚001 level. 
** Significantly different beyond the .001 level, 


For the male subjeets, the zero order correlations between the 
predictor variables and the criterion ranged from a high of .41 for 
SCAT Total to a low of .24 for Co-operative English Spelling. For 
the female subjects, the zero order correlations ranged from .56 
with SCAT Total to .37 with Co-operative English Punctuation. 

A sex difference appeared in the zero order correlations between 
the predictor variables and the criterion, with the female subjects 
yielding zero order correlations of larger magnitude. The null hy- 
pothesis for correlations between the predictors and the criterion 
for sex groups could be rejected beyond the .05 level for all corre- 


lations, with the one exception of the correlations between Illinois 
Mathematics Placement and the criterion. 


Table 2. presents the results of the Ste 


pwise Multiple Regres- 
sion Equation by sex groups, 


бий WIEWB. , Sa^ |o (am 
17 (S3 ЖОЛ F 
Multiple Regression Equation раш for Addition of Successive Predictor Variables: 
Multiple Correlation Coefficient, R?, Coefficient of Alienation, Standard Error 
of Estimate, Beta Weights and Proportion of Crilerion Variance Explained 
ё by Each Variable (C.V.) 


di TIERS с. S.E. est ^ Beta CV. 
polest Mr КЕ AU Sc E ME EM 
SECO ENCORE SRM ЕЛЕМ ЕУ ОМОР ОЛАМ ES 


ааа‏ ا 
SCAT T. .41 .56 .17 .31 .91 .83 75 .66 .20 .25 .082 .140‏ 
H.S.R. .45 .62 .21 .38 .89 .79 .73 .63 . 18 .25 .058 .118‏ 
Eng. G. .46 .63 .21 .40 .89 .78 .73 .62 11 .19 .037 .082‏ 
Math T. .47 M .88 .72 + .08 .026‏ 

Rdg. Voc. .48* .23 .88 ‚72 .06 .018 

Rdg. Comp. .64* 41 .77 .62 ll .052 


* Significantly different beyond the 1001 level between the final multiple correlation for the 
sex samples (.48 for males and .64 for females). 


In both sex groups SCAT Total had the largest beta weight 
and consequently was the first predictor variable to enter the Re- 
gression Equation. In both sex groups, high school rank and Co- 
operative English had, respectively, the next largest beta weights 
and were the number two and number three predictor variables to 
enter the equation. For the male subjects, Illinois Mathematics 
and Co-operative English Reading Vocabulary were the final pre- 
dictor variables from the battery of eleven with statistically sig- 
nificant beta weights. For the female subjects, Co-operative Eng- 
lish’ Reading Comprehension was the only additional predictor 
variable with a significant beta weight. 

Since a significant sex difference appeared in the magnitude of 
the relationship between the single predictor variables and first 
quarter grade point average, a significant difference appeared in 
the magnitude of the multiple correlation coefficients. The null hy- 
pothesis between the final multiple correlation of .48 (males) and 
:64 (females) was rejected beyond the .001 level of significance. 
The greater accuracy of predicting the criterion for female sub- 
jects is illustrated in Table 2 by the smaller coefficient of aliena- 
tion (.77 for females as compared with .88 for males) and standard 
error of estimate (.62 for females as compared with .72 for males). 

Summary. The purpose was to select a battery of optimum pre- 
dictor variables for estimating the criterion of first quarter grade 


404 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


point average by sex groups. The results justify the following 
conclusions. 


First, SCAT Total yielded the largest zero order correlation 
with the criterion. 

Second, SCAT Total, high school rank, and Co-operative Eng- 
lish Grammar accounted for all but .02 of the explained male cri- 
terion variance and all but .01 of the explained female criterion 
variance. However, the remaining explained criterion variance was 
accounted for by different predictor variables for each sex group. 

Third, utilizing the optimum battery of predictor variables, es- 
timates of the criterion could be made with a statistically signifi- 


cant greater degree of accuracy for female subjects than for male 
subjects. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 2, 1962 


i 


, BOOK REVIEWS 


Edited by 


WILLIAM B. MICHAEL 
\ University of Southern California 


| Jenkins and Paterson’s Studies in Individual Differences. Lov 


S. BRADLEY еее 6o n mere S MM a 407 
Hubbard and Clemans Multiple-Choice Examinations in 
: Medicine. PETER G. LORET ..... eee I 408 
Hathaway and Monachesi’s An Atlas of Juvenile MMPI Pro- 
fies. BENJAMIN KLEINMUNTZ ....................+... 410 
1 Chernoff and Moses! Elementary Decision Theory. Jon W. 
COMOR ee e UO or КОШТОН ee I vapid oe 411 


Hemphill’s Dimensions of Executive Positions. HAROLD BorKo 413 
Adams and Preiss’ Human Organization Research. EDWARD 
TEE dando Gu CO EOD ЕО ОС 414 
Sayles’ Behavior of Industrial Work Groups. Hanorp Borko 415 
Young’s Motivation and Emotion: A Survey of the Determi- 
' nants of Human and Animal Activity. BENJAMIN KLEIN- 
а о PELE CR ATE ER bs 416 
Fuller and Thompson’s Behavior Genetics. Joseren R. Royce 418 
Cottle and Downie’s Procedures and Preparation for Counsel- 


ing. S. MARVIN RIFE i.e nnn 419 
Fleishman’s Studies in Personnel and Industrial Psychology. 
Dotais HARRIS е осон wie ems 421 
Katona’s The Powerful Consumer. PETER F. MERENDA ...... 422 
Nunnally's Popular Conceptions of Mental Health: Their De- 
x velopment and Change. BENJAMIN KLEINMUNTZ ......- 423 
Bjerstedt’s Glimpses from the World of the School Child. 
Sous W. RUSBEDD reine vs obs erre derit n rone raul 424 
| Honkavaara’s The Psychology of Expression: Dimensions in 
Human Perception. PATRICK J. CAPRETTA ............... 425 
Johnson, Steffire, and Edelfelt’s Pupil Personnel and Guidance 
Services. GILBERT D. MOORE ....... eee m 426 
405 


406 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Lofquist and England's Problems in Vocational Counseling. 


Hoyan E; Thurman с. Cdi NN МА 428 
Sanford's Psychology, A Scientific Study of Man. James W. 

ROBB Fane A ee eee oe ЛК, eo PN MN 429 
Mussen's Handbook of Research Methods in Child Develop- 

ment, Gc E: MOYERS docs dO TRE E 431 
Lee and Martin’s Human Psychological Development. E. 


ОЧА 436 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 2, 1962 


Studies in Individual Differences by James J. Jenkins and Donald 
G. Paterson (Editors). New York: Appleton-Century-Crofts, 
Inc., 1961. Pp. v + 774. 

The title of this volume of readings is somewhat misleading so 
that the subtitle, The Search for Intelligence, communicates more 
accurately the nature of its contents. This excellent compilation of 
66 original, or adapted, papers focuses exclusively on theory and 
method in the area of intelligence. The papers are arranged chron- 
ologically, each with a brief editorial statement alerting the reader 
to the main contribution that the article makes. A further useful 
device that lends some measure of cohesion is the periodic incor- 
poration of well-chosen selections that aid in clarifying the direction 
and accomplishments of the other contributions up to that point 
in the chronology. 

With the exception of two papers by Galton, this volume pre- 
sents the 20th century odyssey in the quest for the Holy Grail of 
intelligence. It is a fascinating journey for the reader, as well as 
an interesting contribution to the history of ideas. Proceeding chro- 
nologieally through these selections (the most instructive ap- 
proach!), one commences with the early view of intelligence that 
closely approximates in depth and conceptual clarity the level of 
understanding an advanced undergraduate brings to the problem 
of intelligence. In moving through the selections one participates 
in the expanding theoretical and methodological sophistication 
marking the development of the field. Often such attempts as this 
go awry in presenting an honest appreciation of the evolution of 
an area of knowledge by showing only the notable achievements 
and omitting the unfortunate mutations, Happily, such is not the 
case here for (to shift metaphors), along with a sense of advance- 
ment along the highway of knowledge, journeys down the prim- 
rose path (physique and inteligence, Sensory discrimination, ete.) 
are also presented. So too, amidst the course of inquiry into sub- 
stantive questions and increasing technical proficiency, one senses 
the excitement of workers tilling the fertile fields of racial and 
ethnic differences in intelligence; the heated acrimony associated 
with nature or nurture, e.g., “Our conclusion is, therefore, that the 
Towa statistical laboratory has played a far greater part in af- 
fecting the ‘intelligence’ of children than has the Iowa nursery 


407 


408 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


school . . .” (Goodenough & Maurer), and the enthusiasm engen- 
dered by each new development in testing technology. 

The impetus to the testing movement given by the World War I 
Alpha and Beta is evident, as well as a sampling of the contribu- 
tions to understanding these mass testing program occasion in such 
matters as the demographic dimensionality of population intellect. 
The sociological and social-psychological investigations consequent 
to the wholesale testing during both wars indicates something of 
the mutual advantage that accrues. 

Selections exemplifying the early conflict between Thorndike 
and Spearman on the question of general versus specific factors in 
intelligence provide an introduction to a series of papers portray- 
ing the development of factor analytic techniques. It is here, per- 
haps, where the student can get the clearest perception of the in- 
teraction of theory and technique. It is in connection with factor 
analysis, as well, that it is most evident the extent to which a cer- 
tain level of statistical comprehension is desirable for a thorough 
understanding of a number of the selections. But, of course, a 
book of readings always requires supplementation. In addition to 


bringing the factor analytic approach up to date with Guilford’s _ 


recent contribution (Three Faces of Intellect), something of the 
growing interest in the “nonintellective” components of intelligent 
behavior is suggested in selections by Gough and by Wechsler. 

This book of readings, then, provides a splendid historical frame- 
work within which to view the emergence of an important area of 
psychological research and theory. The selections have been ju- 
dieiously chosen, and while the usual shortcomings of a collection 
of papers are evident in matters of brevity, loss of integration, and 
failure of issues raised in several papers to coalesce, this only per- 
mits greater scope for the individual instructor to impose his own 
structure (or bias) and to shape the material into his particular 
course as he sees fit. 

Lov S. Bnangv 


University of California, Santa Barbara 


Multiple-Choice Examinations in Medicine by John P. Hubbard 


and William V. Clemans, Philadelphia: Lea and Febiger, 1961. 
Pp. 186. $3.75. 


This book is intended to serve both as a guide for the examiner 
n the construction and analysis of multiple-choice examinations 


Graduates. Considering the natural limitations of a relatively short 
book, Hubbard and Clemans are to be congratulated on their suc- 


BOOK REVIEWS 409 


important part in the medical schools of the United States in the 
past two decades. There has long been a need for such a systematic 
guide, particularly one which is directed primarily to a medical 
audience. 

Three of the eight chapters are devoted to a discussion of various 
types of test questions, their rationale, and the test construction 
process, The illustrative items presented should prove particularly 
helpful to the reader, as should the discussion dealing with the re- 
writing of items. This section concludes with a list of standards 
which each test item should satisfy; however, this reviewer wishes 
that some specific comment might have been made about the use 
and abuse of “None of the above” as an option, since the novice 
item-writer frequently falls into the trap of using this option much 
too frequently. It would also seem desirable to have included in 
this section some discussion of the need for including as much in- 
formation as possible in the stem (leading clause) of an item. Un- 
fortunately, several of the examples in the appendix are character- 
ized by the use of a stem which requires that the examinee read 
each and every option in order to determine what is really being 
asked; for example, “The etiologic agent of infectious hepatitis” 
or “As a result of eutting one hypoglossal nerve." It might have 
been valuable to point out this pitfall to the reader and to stress 
the “rule” that the stem of an item ought to present either a com- 
plete question or as much of the problem as is possible. 

A brief but excellent discussion relative to the scoring and anal- 
ysis of multiple-choice tests is presented in Chapters 4 and 5. 
Chapter 6, which is devoted to a discussion of multiple-choice and 
essay tests is, with its annotated bibliography, an excellent resumé 
of the pros and cons of the two types of tests; the presentation of 
an essay question in pathology with the criteria used for grading, 
followed by a series of multiple-choice questions covering the same 
material, is particularly illuminating. 

The use of test scores for group comparisons both within and . 
between institutions is treated in the seventh chapter, while Chap- 
ter 8 describes the examination prepared for the Educational Coun- 
cil for Foreign Medical Graduates. 

Outlines of subject matter for individual NBME tests are in- 
cluded in the appendix, as is a sample examination of 170 items for 
which an answer sheet and key are provided. 

This book will undoubtedly prove to be extremely valuable to 
the examiner who seeks to construct and administer multiple-choice 
examinations in medicine and related fields. Its value as a guide 
to the examinee, however, may be somewhat overemphasized; this 
latter function might perhaps be more appropriately fulfilled by a 
brief but comprehensive bulletin of information for the candidate. 

PETER С. LonET 
Educational Testing Service 


410 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


An Atlas of Juvenile MMPI Profiles by Starke R. Hathaway and 
Elio D. Monachesi. Minneapolis: The University of Minnesota 
Press, 1961. Pp. xviii -|- 402. $8.00. 

The MMPI was developed in a psychiatrie setting for the ex- 
press purpose of providing scores “that are commonly characteris- 
tie of disabling psychological abnormality.” However, the extension 
of the inventory’s use to include its application to normal per- 
sons was anticipated by the test authors. “Although the scales are 
named according to the abnormal manifestation of the sympto- 
matic complex, they have all been shown to have meaning within 
the normal range" (Hathaway & McKinley, 1951). It is within this 
latter area of application that An Atlas of Juvenile MMPI Profiles 
makes its major contribution. Hathaway and Monachesi “hope 
and expect that this Atlas will far more often help to show the 
strength of the youth than to emphasize the weaknesses." 

From a sample of approximately eleven thousand Minnesota 
boys and girls, Hathaway and Monachesi selected 1,088 MMPI 
profile codes and with the help of field workers, ministers, and par- 
ents furnished descriptive case histories for each of the test pro- 
files. The selection of cases in this reference work differs from the 
one used by Hathaway and Meehl in Atlas for the Clinical Use 
of the MMPI (1951). In the latter instance, selection started with 
the descriptive and case history data, whereas in the current book 


BOOK REVIEWS 411 


There is no question that Hathaway and Monachesi's Atlas will 
be an invaluable tool for the worker who is trained in the use of 
the MMPI and whose daily activities bring him in contact with 
adolescents. For those persons who work with juveniles and who 
feel that they would like to use the Atlas, but who are unfamiliar 
with MMPI jargon and folklore, the following basic sources for 
consultation are suggested: Basic Readings on the MMPI in Psy- 
chology and Medicine (Welsh & Dahlstrom, 1956); An MMPI 
Handbook (Dahlstrom & Welsh, 1960); An Atlas for the Clinical 
Use of the MMPI (Hathaway & Meehl, 1951); and An MMPI 
Code Book for Counselors (Drake & Oetting, 1959). Persons who 
are familiar with the use of the MMPI, but are not accustomed 
to gleaning personality information from an Atlas, may find Meehl’s 
work (1956) and Duker’s recent contribution (1958) helpful. 


REFERENCES 

Dahlstrom, W. С. and Welsh, С. 8. An MMPI Handbook: A Guide 
to Use in Clinical Practice and Research. Minneapolis: Univer- 
sity of Minnesota Press, 1960. 

Drake, L. Е. and Oetting, E. В. An MMPI Codebook for Counsel- 
ors. Minneapolis: University of Minnesota Press, 1959. 

Duker, Jan. “The Utility of the MMPI Atlas in the Derivation of 
Personality Descriptions." Unpublished doctoral dissertation, 
University of Minnesota, 1958. 

Hathaway, S. R. and McKinley, J. C. The Minnesota Multiphasic 
Personality Inventory Manual (Revised). New York: Psycho- 
logical Corporation, 1951. 

Hathaway, S. R. and Meehl, P. E. An Atlas for the Clinical Use 
ee MMPI. Minneapolis: University of Minnesota Press, 

Meehl, P. E. “Wanted—A Good Cookbook.” American Psychologist, 
XI (1956), 263-272. 

Welsh, G. S. and Dahlstrom, W. G. Basic Readings on the MMPI 
in Psychology and Medicine. Minneapolis: University of Min- 
nesota Press, 1956. 

BENJAMIN KLEINMUNTZ 
Carnegie Institute of Technology 


Elementary Decision Theory by Herman Chernoff and Lincoln E. 
Moses. New York: John Wiley & Sons, 1959. 

This book is intended for use by the mathematically naive stu- 
dent (high-school algebra and trigonometry are fully adequate) 
in a beginning course in statistics. Because it presents statistics 
from a specialized point of view, that of maximizing utility when 
making decisions based on uncertain evidence, some instructors 
may prefer to use it in a later class, leaving the elementary course 
for a more eclectic treatment of hypothesis testing and estimation 
procedures, including routine tests such as two-group ¢ tests, anal- 
ysis of variance, and chi-square, none of which are ‘covered by 


412 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Chernoff and Moses. (These authors promise a suecessor to Elemen- 
tary Decision Theory which will have a conventional coverage with 
a decision theory flavor.) The general reader will also find it a 
fine introduction to several topics rarely presented at a precalculus 
level: admissibility of tests and strategies (chapter 1), utility (chap- 
ter 4), convex sets (chapter 5), Bayes’ theorem and Bayes’ strate- 
gies (chapter 5 and thereafter), sequential analysis (chapter 9), 
the method of moments (chapter 10), maximum likelihood estima- 
tion (chapter 10), invariant estimators (chapter 10), sufficient es- 
timators (chapter 10), consistent estimators (chapter 10), and 
efficient estimators (chapter 10). 

In two respects the book reflects the immaturity of the field of 
decision theory. The treatment of utility is interesting in its own 
right but has almost no influence on later sections, where loss and 
regret are characterized neither in dollars nor in utility units. Cor- 
respondingly, the concluding two chapters, though presenting a very 
modern treatment of the foundations of statistical theory, use 
Bayes’ strategies only programmatically because of difficulty in 
determining the values of regret or loss functions and of a priori 
probability. The upshot appears to be this: The conventional ap- 
proaches of Fisher and of Neyman are always available; since 
they are specializations of decision theory one should leave them 
whenever regret or loss estimates are sufficiently clear to make 
more general procedures applicable. For psychologists, however, 
almost any use of decision theory in statistics will be an advance 
beyond the status quo and should be welcomed. 

This book seems to meet the goals of its authors almost perfectly. 
Typographical errors are almost nonexistent, and coverage is ex- 
cellent. Because its subject matter is extraordinarily difficult, com- 
pared with most texts of this level, readability is at а premium here. 
Occasionally, as in Chapter 5, students seem to find the writing 
style obscure. Less exposition and more statements of definitions, 
postulates, corollaries, and theorems might be helpful. This re- 
viewer would also appreciate a more explicit emphasis of the fact 
that the boundedness of the utility function follows (p. 106 and 
p. 352) from the empirical boundedness of utility of the St. Peters- 
burg game. 

One notational convention used in this book commends itself 
most highly. The authors use 1,2 to represent the variance as esti- 
mated when using a divisor of N and Sa” to represent, the corres- 
ponding unbiased estimator using a divisor of N—1. General 
adoption of this convention would clarify statistical instruction 
considerably. 

In summary, Elementary Decision Theory is the only precalcu- 
lus statistics text treating decision theory and one of the very 
few with a sophisticated treatment of statistical theory. It deserves 


| 
| 


fj 


BOOK REVIEWS 413 


a wide reading by psychologists and educators as well as among 
its intended student users. 

JOHN W. COTTON 

University of California, Santa Barbara 


Dimensions of Executive Positions: A Study of the Basic Char- 
acteristics of the Positions of Ninety-Three Business Executives 
by John K. Hemphill. Bureau of Business Research Monograph 
Number 98. Columbus, Ohio: Ohio State University, 1960. Pp. 
xiv + 103. 

This soft-covered 103-page monograph is one of the Ohio Studies 
in Personnel. The research is part of the Executive Study program 
which is being conducted by the Educational Testing Service with 
the cooperation of a number of companies. 

How does one study the characteristies of an executive? There 
are many answers to this question, but one would be hard-pressed 
to come up with a more detailed, more scientifically precise method 
than that reported by Hemphill. Ninety-three executives were se- 
lected as representative of different managerial positions. They 
were given a questionnaire of 575 items, each of which is scored 
as an 8-category scale—O if the item is definitely not part of the 
position, and 7 if it is a most significant part of the position. 
Product moment correlation coefficients were computed for each 
pair of the 93 positions. The matrix was divided and factor anal- 
yzed, using a procedure described by Tucker for the determina- 
tion of inter-battery factors. Ten factors were extracted and 
rotated orthogonally for simple structure. These factors were 
interpreted as dimensions characterizing executive performance 
and labeled as staff service, supervision of work, business control, 
and the like. 

The study continues beyond the routine factor analysis through 
the construction of a new questionnaire consisting of 191 of the 
575 items. The choice of items was determined by the factor weights 
of these items on the two batteries. This new shortened scale is 
available for use, together with a profile for plotting the executive 
position. 

Appendices contain the position title of the 93 subjects, the 575- 
item questionnaire, and the 191-item revision, as well as the fac- 
tor names and the unrotated factor loadings. 

A few technical details in the study are not reported with suffi- 
cient clarity to facilitate replication. For example, it is stated that 
the computations were “carried out with the aid of electronic data 
processing machines,” but the type of computer and programs are 
not mentioned. In the factor analysis program, the value substi- 
tuted into the diagonals is left in doubt. We are told that the ten 
factors were rotated orthogonally, but it is not clear as to whether 


414 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


an analytie or graphie method of rotation was used. Of somewhat 
more importance is the unresolved question as to whether the 8- 
point scale responses to the 575 items were normalized prior to the 
computation of the correlation coefficient. Finally, the factor scor- 
ing procedures for developing the revised questionnaire are quite 
vague. These, however, are mild criticisms, for it is recognized that 
the author is writing a monograph and that information of a highly 
detailed and specific nature would be available, if requested. The 
research is a good solid piece of work and a significant addition to 
the literature. It contributes to the understanding of the dimensions 
of executive performance and outlines a methodology which will 
be useful for further studies in this and related areas. 

HAROLD Bonko 

System Development Corporation 


Human Organization Research by Richard N. Adams and J ack J. 
Preiss (Editors). Homewood, Illinois: The Dorsey Press, Inc., 
1960. Pp. xvii + 456. 

Some ten years ago the Society for Applied Anthropology ini- 
tiated in its journal, Human Organization, a new section dealing 
with Field Methods and Techniques, This new department, reflected 
a growing awareness of the benefit novice field workers would de- 
tive from a knowledge of the problems encountered by more ma- 
ture field workers. During the intervening years, an increasing 
number of papers have dealt with problems specifie to field re- 
search among social groups. 

‘Of the 32 papers which Adams and Preiss have selected for in- 
clusion in the present volume, most have appeared in Human Or- 
ganization in substantially the same form, though two appeared 
in other Journals and six are new papers requested by the editors. 
Two-thirds of the contributors are sociologists, a reflection, per- 
haps, of the fact that anthropologists traditionally have been less 
concerned with formal research methodology. Adams and Preiss 
have divided the papers into two types: those dealing primarily 
with field problems involving interpersonal relations, and those 
dealing with field problems involving methodology. 

Critical interpersonal relations exist among members of a re- 
search team, between researcher and client, and between researcher 
and informant. For instance, while the growing trend toward in- 
terdisciplinary research in the social Sciences is producing results 
which extend the contributions of individual research workers, there 
are interpersonal problems in team research which need to be made 
explicit. Several papers in Human. Organization Research are di- 
rected toward the problems which face members of a research 
team. Members of the team may expect too rapid an accomplish- 
ment and each member may be under pressure to produce. The 


d 


д, 


BOOK REVIEWS 415 


problem is compounded further in those eases in which one mem- 
ber, being perceived by the others as an expert in his discipline, 
has placed on him an onus by the others to take a definitive the- 
oretical position. Such pressure may lead to conservatism and stifle 
creativity. Similar problems and pitfalls are discussed in several 
papers, and more frequently than not the authors exemplify their 
discussions with case histories. 

The second part of the book is devoted to research methodology 
in applied anthropology. There exists no lack of adequate texts on 
research methodology—the design of a study, and the acquisition 
and analysis of data—but such texts often fail to prepare the stu- 
dent for the possibility that his first field research may not mold 
itself comfortably within the framework of formal methodology. 
What is needed by such students is material which discusses ways 
in which classroom methodology is adapted to field research. This 
need is fulfilled by the second half of Human Organization Re- 
search, which emphasizes techniques and methodology appropriate 
to actual field research. Abundant case material is used by most 
of the authors, and the topics covered range from projective tech- 
niques to survey methods. 

The reader should not expect uniformity in a book which sam- 
ples the thinking of 48 contributors. But the reader who wishes to 
gain an appreciation of the breadth and richness of current re- 
search undertaken within a cultural context is not likely to be 
disappointed in Human Organization Research. 

EDWARD LEVONIAN 
University of California, Los Angeles 


Behavior of Industrial Work Groups by Leonard R. Sayles. New 
York: John Wiley and Sons, 1958. Pp. viii + 182. $4.75. 

The description and analysis of industrial work groups reported 
in this volume are based upon the author’s extensive field opera- 
tions over a period of years. He points out that the relatively sim- 
ple concept of the informal work group has persisted since the 
early Western Electric studies, and that these concepts should now 
be examined, revised, and brought up to date. Sayles, taking his 
own adviee, examines work group behavior in order to determine 
why one group encourages aggressive informal leadership while 
another does not; why one group fights management poliey while 
another aecepts it; or why one group is characterized as coopera- 
tive and another as troublesome. 

In this study, industrial work groups are divided into four types 
based upon the salient feature of their behavior in relation to man- 
agement and the union. These groups are called (1) Apathetic, 
(2) Erratic, (3) Strategic, and (4) Conservative, The degrees and 
kinds of group activities manifested by these four groups are re- 


416 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


lated to various characteristics of the job and work situation. By 
drawing upon his extensive and varied experience, the author pro- 
vides many illustrations of the different behavior patterns and in- 
dieates some significant relationships. 

For example: “Interdependence in the work process (among 
members of a work group) tends to be associated with the more 
spontaneous, sporadic kinds of outbursts. Sustained activity, which 
seems to be the product of carefully thought-through, long-run 
objectives, is more characteristic of independent, individual opera- 
tions than of erew and assembly lines” (p. 91). 

These insights could provide both labor and management with 
information of use in promoting stable and mutually satisfying 
labor relations. For the applied industrial psychologist, Sayles’ con- 
cepts should provide an appreciation of the complexity of industrial 
work groups and prevent simple-minded overgeneralizations. The 
research-oriented worker in the area would, on the other hand, 
tend to adopt an attitude of mild but interested skepticism, until 
this explanation of industrial work group behavior is supported by 
experimental evidence or by repeated independent observations. 

HAROLD BORKO 
System. Development Corporation 
Santa Monica, California 


Motivation and Emotion: A Survey of the Determinants of Hu- 
man and Animal Activity by Paul Thomas Young. New York: 
John Wiley & Sons, 1961. Pp. xxiv + 648. $10.75. 

As a by-produet of Professor Paul Thomas Young's desire to re- 
port his experimental research findings on the hedonic theory of 
motivation, psychology is presented with a systematic survey of 
the determinants of human and animal behavior, Motivation and 
Emotion appropriately combines into one textbook the author’s 
two earlier volumes: Motivation and Behavior (1936) and Emo- 
tion in Man and Animal (1943). With the possible exception of 
Bindra's Motivation: A Systematic Reinterpretation (1959), P. Т. 
Young's book stands alone in its up-to-date and rather complete 
coverage in one volume of the two closely related topics of motiva- 
tion and emotion. 

The two pivotal chapters, around which the author seems to 
have organized this book, are called “Affective Arousal and Acti- 
vation” (Chapter 5) and “Incentive Motivation” (Chapter 6). 
Basing his theory of motivation on the principles that affective 
processes activate behavior, sustain or terminate activities, regu- 
late the pattern of behavior, facilitate or inhibit instrumental acts, 
and organize neurobehavioral patterns of approach and withdrawal, 
P. T. Young eliminates as crucial the drive reduction theory of re- 
inforcement and rejects the hypothesis that the consummatory re- 


йе... 


BOOK REVIEWS 417 


sponse is of critical importance in reinforcement. Also the author 
disregards what he calls the *probability concept of Spence, Skin- 
ner and others" in favor of the recognition of the central role of 
affective processes in learning and motivation. In these chapters 
the author presents a critique of the constructs and systems he 
chooses to disregard, and he documents his arguments with findings 
Írom his own laboratory. 

Motivation and Emotion is organized into twelve chapters. The 
first chapter contains a discussion of causation and free will and 
presents the author’s views on the definition of psychology in re- 
lation to the study of motivation and emotion. Chapter 2 contains 
a discussion of the principal forms of activity, and Chapters 3 and 
4 deal with such basie biological determinants as drive, homeo- 
stasis, and organic need. Chapters 5 and 6, as mentioned above, 
describe P. T. Young's research on the role of affective processes 
in motivation and learning. 

As distinct from activation and/or arousal, Chapters 7 and 8 
deal with the regulation and direction of activity. Here the reader 
will find discussions of purposive behavior and the roles of tension, 
effort, determining set, environmental factors and the role of cog- 
nitive organization in the regulation of behavior. 

Chapter 9, "Nature and Bodily Mechanisms of Emotion,” is the 
book’s main chapter on emotion and is in large part culled from 
the author's earlier book, Emotion in Man and Animal (1943). 
Chapters 10, 11, and 12 deal respectively with the development of 
motives and emotion, the social and personal determinants of be- 
havior, and the relation of motivation and emotion to frustration, 
conflict, stress, and neurosis. An evaluation and overview of the 
complete area covered in the text is presented in the form of a 
short chapter called “General Conclusion.” Some helpful sugges- 
tions are made in this chapter to the student of motivation and af- 
fective processes who plans to do research in the area. Rounding 
out the volume is a “Questions and Exercises” section, a list of 
references containing about 550 works cited in the text (and con- 
veniently the pages on which the citations occur) and a subject 
index. 

The intended readers of this textbook are primarily students in 
advanced courses on motivation and affective process. In some 
measure the author has satisfactorily met the needs of such stu- 
dents, but there are several areas of considerable and current in- 
terest in motivational psychology that are omitted or treated in- 
sufficiently. No mention is made, for example, of J. V. Brady and 
H. F. Hunt's excellent research on the conditioned emotional re- 
sponse (CER), nor is there any mention of Amsel’s studies on the 
motivational properties of frustration. The work of Hull, Spence, 
Skinner, Mowrer and Tolman receives perfunctory treatment; and 


418 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


courses in motivational psychology, but obviously those instructors 
who use it will find it necessary to supplement it liberally with 
other books and references, 

BENJAMIN KLEINMUNTZ 

Carnegie Institute of Technology 


Behavior Genetics by John L. Fuller and W. Robert Thompson. 
New York: John Wiley & Sons, 1960. 

This book is an important landmark in the history of interdisci- 
plinary research linking genetics and psychology, for, as the first 
book on the subject, it represents a summary of a field sufficiently 
developed to deserve this kind of treatment and is at the same 
time symbolic of a promising future for the field of behavior 
genetics. Fuller and Thompson have done a superb job of organiz- 
ing a domain which is central to psychology, but a domain which 
has lain dormant for some years because of methodological limita- 
tions and a typieally American Zeitgeist which has been dominated 


the domain of behavior genetics. This includes a review of Men- 
delian principles, linkage, the relevance of physiological genetics 
by way of explaining what intervenes between the genotype and the 
phenotype, population genetics, and polygenic systems, The chapter 
on designs for genetic research appropriately stresses the model for 
polygenie traits, which involves the cumulative effects of genes 
rather than the discrete Mendelian phenotypes which are always 
included in the introductory psychology texts. The authors also 
provide a clear exposition of the two relevant mating systems for 
psychological Tesearch—selection, and the development of highly 


BOOK REVIEWS 419 


thors say about the importance of genes in determining behavior? 
Nothing startling, but the situation is much more nearly apparent 
now, and some new tools have been sharpened for future work. The 
authors have compiled convincing evidence to show that genes are 
involved in all aspects of behavioral variation, but it is unfortu- 
nately all too clear that in very few cases (e.g., feeble-mindedness 
due to phenylketonuria) has it been possible to spell out the details 
as to exactly how these influences occur. 

The last chapter contains the authors’ attempt to provide us with 
a theoretical model. It “implies multiple-factor control of psy- 
chological traits and the existence of complex gene interactions in 
the development of phenotypes.” In other words, it is a non-Men- 
delian or polygenie model. 

The quantitative psychologist will find a variety of mathematical- 
statistical problems of interest. First of all there are the standard 
problems which are peculiar to genetics per se—for example, prob- 
ability formulations for estimating gene frequencies under a variety 
of conditions such as dominance, recessiveness, multiple alleles, ete. 
Perhaps of greater interest are the varieties of variance and 
covariance analyses. There is, for example, Holzinger’s coefficient 
of heritability for determining the proportion of genetic variance 
within families; Cattell’s multiple variance equations, which break 
down the total variance into a variety of combinations of genetic 
and environmental variance in such terms as monozygotic twins 
reared together, monozygotic twins reared apart, dizygotic twins 
reared together, apart, siblings reared together, apart, etc.; and the 
intra-class correlation coefficient, which is another measure of trait 
heritability. 

Factor analysis is also relevant, empirically by providing data 
on the inheritance of the Primary Mental Abilities, methodologi- 
cally by providing a means for identifying units of complex be- 
havior, and conceptually in terms of a multi-factor genetic-behavior 
theory of variation. 

Students of behavior genetics will long be grateful to gencticist- 
physiologist Fuller and psychologist Thompson for providing such 
a thorough and clear statement of the present condition of their 
special area of competence. 

Josera R. Royce 
University of Alberta 


Procedures and Preparation for Counseling by William C. Cottle 
and N. M. Downie. Englewood Cliffs, New Jersey: Prentice- 
Hall, Inc., 1960. Pp. 330. 

A persistent need in guidance, especially among beginning sec- 
ondary school counselors, is the means for integrating the data ac- 
cumulated on individual students into the counseling process. This 


40 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


work is an up-to-date attempt to meet this need by drawing to- 
gether “from a number of sources the kinds of information and the 
type of preparation the counselor will need prior to a series of 
counseling interviews." This volume is not intended to be a text in 
counseling theory as such, but a source book of precounseling tech- 
niques. To be sure, the initial interview is the subject of Chapter 
four, but this is included because it is considered an important part 
of establishing working relationships between counselor and client 
for subsequent counseling. 

The book is sequentially structured to provide the counselor with 
an orderly approach to the data he needs to use with his counselees. 
Since the purpose of the book is rather broadly for resource use by 
such persons as teacher-counselors, school counselors, personnel 
workers, rehabilitation counselors, and employment service coun- 
selors the authors have compromised somewhat by including both 
introductory and advanced materials. It would be a matter of the 


reader’s own perceptions as to whether or not a “happy medium” 
has been reached. 


Some readers of the two chapters on statistics (6 and 7) might 
react with a question concerning the wisdom of their inclusion in 
this book. Some might find them too elementary, and others beyond 
their understanding. It is to be assumed that the authors have in- 
cluded these chapters аз necessary to the appreciation of the total 


The chapters on selection and use of standardized tests, evalua- 
es, interests, and personality assess- 


made. Their faith in the relative validit 
taken by an individual who understands and accepts the voluntary 


BOOK REVIEWS 421 


counseling situation, would appear to be reasonably well substan- 
tiated in experience, 

A final chapter on “The Counselor's Research" is heartening. A 
wholesome attitude is taken toward the acceptability of the “grass 
roots” operations research which might characterize the investiga- 
tions of the average school counselor, in contrast to a more fore- 
boding and pretentious kind of investigation. Much conerete help 
and examples are given for these endeavors in this chapter. 

This is a sound book, produced by two men who are well grounded 
in both theory and practice, Although limited somewhat as a basic 
text for any given graduate course in guidance, it should be of con- 
siderable value as a ready reference for the counseling practitioner 
in the field and as a guide-book for others seeking to enter the 
counseling profession. It constitutes a helpful synthesis of materials 
that might otherwise be scattered in a variety of sources. 

S. Marvin RIFE 
University of Rhode Island 


Studies in Personnel and Industrial Psychology by Edwin A. F leish- 
man (Editor). Homewood, Ill.: The Dorsey Press, 1961. Pp. xi 
+ 633. $7.00. 

This book consists of a compilation of 66 previously published 
articles in the area of industrial psychology. The articles are grouped 
into nine sections—from Personnel Selection to Human Engineering 
with seven other pertinent categories in between, Each section is in- 
troduced by a brief discussion of the section topic and the relevance 
to the topic of the included articles, 

Dr. Fleishman will undoubtedly receive many compliments for 
his fine workmanship. The articles appear to have been selected with 
considerable care and insight for easy grouping into sections; they 
complement each other nicely. The section introductions are clear 
statements of what the sections include and why the included arti- 
cles were selected. 

There may be some questions concerning what the book could be 
used for. The suggestion made in the preface that the book could 
serve as a basic text may cause a few people, in addition to the re- 
viewer, to look around for a soap box from which to air a prejudice. 
Basic texts should provide a meaningful framework from which the 
student can hang subsequently obtained bits of information. The 
integrated approach required to achieve this end is not to be found 
in a collection of independently written articles. 

The suggestion that the book be used as a supplementary text for 
assigned readings seems more reasonable. Many of the articles 
would enrich a course in industrial psychology. There is even a 
question here, however, of whether or not the student should be 
asked to pay seven dollars for a privilege he enjoys with his library 


422 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


card. At least 60 of the 66 articles can be found in a college library. 
Forty-six of the articles, in fact, can be found in the issues of just 
three journals— Personnel Psychology, Personnel, and the Journal 
of Applied Psychology. 

Doveras Harris 


Human Factors Research, Inc. 
Los Angeles, California 


The Powerful Consumer by George Katona. New York: McGraw- 
Hill Book Company, 1960. Pp. ix + 276. $6.50. 

This unusual book by Katona represents the end product of nearly 
fifteen years of work in the Economic Behavior Program of the Uni- 
versity of Michigan Survey Research Center. A treatise on con- 
sumer psychology, it integrates scientific evidence, psychological 
theory, and economic principles in a neat package resulting in a new 
exciting discipline which the author aptly entitles “psychological 
economies.” The main purpose of the book is to show that consumer 
attitudes, feelings, and notions can have a definite effect upon a 
nation’s economy, and that a knowledge of the psychological con- 
structs of the consumer population provides a sound basis for mak- 
Ing accurate economie forecasts, 

Katona formulates the thesis that consumer attitude is as power- 
ful a determinant of the status of the national economy as is 
consumer income, and seeks to support his hypothesis through the 
research findings and theoretical considerations presented in this 
book. Hence, The Powerful Consumer should be of great interest 


which outlines the methodology, measurement problems, and theo- 
retical assumptions of economic survey research. 

: Tt is this Teviewer's opinion that the author has admirably achieved 
his purpose in writing such a book—to demonstrate through a well- 
research findings that how the con- 
sumer feels toward a product or service being marketed, and also 


Prrer F. MERENDA 
University of Rhode Island 


P 


BOOK REVIEWS 423 


Popular Conceptions of Mental Health: Their Development and 
Change by Jum C. Nunnally, Jr. New York: Holt, Rinehart 
and Winston, Inc., 1961. Pp. viii -+ 311. $5.00. 

Popular Conceptions of Mental Health summarizes the results of 
six years of investigation conducted by members of the Institute 
of Communications Research, University of Illinois. The studies, 
supported primarily by The National Institutes of Mental Health, 
were designed to provide answers to two questions: 1) What are the 
existing conceptions of mental health? and 2) How сап the existing 
conceptions be changed for the better? 

Part I of the book, entitled *Studies of Existing States of Infor- 
mation and Attitudes," reports the findings of studies conducted to 
measure the current level of public information about the mentally 
ill, and of studies designed to assess the public's attitude toward the 
mentally ill and toward mental health personnel. A 60-item factor- 
analyzed questionnaire was used for the information survey, and 
the semantie differential technique was adapted for the attitude 
appraisal. The author makes a distinction between information and 
attitude which must be kept in mind while reading the book: The 
term information is used to refer to verifiable statements such as 
“There are more men than women in mental hospitals.” In contrast, 
a statement such as “I am afraid to be around anyone who has had 
a mental disorder" concerns an attitude, or feeling, in which no 
question of truth or falsity is at issue. 

Throughout the book, general conclusions which may be drawn 
from the findings are neatly and conveniently presented in the form 
of propositions. Some of the salient propositions appearing in the 
first part of the book, and frequently coming as no surprise to the 
reader, are listed here: 


The public is uninformed about many issues. 

Experts are in reasonable agreement about some aspects of a 
public-information program. 

Public attitudes are relatively negative toward persons with 
mental-health problems, 

The media of mass communication generally present a distorted 
picture of mental-health problems. 

A sizable proportion of mental patients are treated directly by 
general practitioners. 

General practitioners tend to have as negative attitudes toward 
the mentally ill as do members of the lay public. 


Part II, called “Studies of Information Transmission and Atti- 


р tude Change," contains reports of research in which the effective- 


ness of "experimental" and "control" messages on information and 
attitude development and change among various groups were meas- 
ured. Some of the findings, in the form of propositions, follow: 


44 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Public interest in communications about mental illness is in- 
creased when the messages reduce anxiety and provide solutions 
to problems. 

Communications relating to mental illness usually create a 
fairly high level of anxiety. 

Even if available information may turn out to be incorrect, it 
is better to give such information to the publie than to with- 
hold it. 

If the source of a message is to be identified as being, or having 
been, mentally ill, it is better for the identification to be made 
after the substance of the message has been imparted. 

False information can serve a useful purpose on occasion. 


In an otherwise informative book, some features which merit 
criticism are the lack of reference to related studies which have been 
conducted in this general area (the bibliography has 15 references), 
the scanty index (77 entries), and the presentation of average 
semantic differential ratings without accompanying variances. The 
latter omission is especially apparent in instances where the mean 
semantic differential ratings hover near the mid-point position of 
the rating scale. 

Generally; this reviewer was favorably impressed by the author's 
concise style of writing, by his development and complete descrip- 
tion of measuring devices exceptionally useful in the mental-health 
area, by his honesty in carefully pointing out the limitations of the 
measuring instruments and sampling techniques, and by the au- 
thor's facility for staying close to the actual research results. 

BENJAMIN KLEINMUNTZ 
Carnegie Institute of Technology 
Pittsburgh, Pennsylvania 


Glimpses from the World of the School Child by Ake Bjerstedt. 
New York: Beacon House, 1960. Pp. 131. $3.50. 

An improved technique for helping adults to understand chil- 
dren is potentially of great value to humanity. The ones currently 
in use are of such limited value, according to the author of this 
monograph, that all too often the professional worker turns to let- 
ters and diaries for glimpses into the world of the school child. The 
author seems to have great hopes for his five-step intersubject inter- 
view technique; certainly, he has been highly creative and has gath- 
etus immense amount of data; certainly, he deserves to be 

eard. 

Bjerstedt, born in 1930 and now associated with the University 
of Lund in Sweden, has developed a method having antieipatory, 
interview, and retrospective phases. Two children take turns in 
interviewing each other. Prior to this, each one individually showed 


= 


BOOK REVIEWS 425 


what he would say to elicit enough information from another child 
to understand him and then what they would say about themselves 
to enable another child to know the former. In the final stage of 
this five-stage sequence, each separately looks back at what hap- 
pened to give further ideas. 

Each set of eight records, obtained from each pair of children, is 
coded by two adults. In this analysis, first, there is identification of 
information units, which are then studied for the amount, content, 
sequence, and information level of these units. A long list of con- 
cepts is defined for possible use in this type of analysis. 

The monograph is divided into three parts: a) methodological 
approach, b) developmental study among school children and с) a 
study on communication among children without a common lan- 
guage. The second and third parts provide the nearest thing there 
is to a “validation.” 

A certain amount of statistical analysis is used. Even so, essen- 
tially, it seems to be about in the category of the Rorschach insofar 
as the degree of objectivity of scoring is concerned. With the pres- 
ence of two coders, comparative statistics are made possible. The 
coders were found to agree and to be consistent with themselves. 
Even so, the usual difficulties are encountered in attempting to give 
some semblance of objectivity to unstructured material. 

The best indication of worth seems to come from the picture of 
how zestfully the children responded to this technique. Apparently 
the particular ones involved found it all quite fascinating. Com- 
pared to many ways used to stimulate children to respond under 
slightly structured conditions, this one may be especially produc- 
tive. In any case, Bjerstedt’s report seems to offer this hope. 

In summary, a fascinating story of a unique approach, applied to 
Swedish children and to children in an international camp, is pre- 
sented in a convincing manner. It should lead to many attempts on 
the part of others to apply the technique to numerous situations. 

James W. RUSSELL 
De Paul University 


“The Psychology of Expression: Dimensions in Human Perception" 


by Sylvia Honkavaara. The British Journal of Psychology, 
Monograph Supplements, 1961. Pp. 96. 

A series of experiments was conducted to test the general hy- 
pothesis that recognition of expression is not a primitive instine- 
tive ability but dependent upon a kind of mental maturation in 
which biological and educational factors are influential. The ex- 
perimenter presents an interesting and refreshing (for the Ameri- 
can laboratory-bound psychologist) array of empirical evidence in 
support of her contention. By demonstrating that the recognition 
of expressions (e.g., happiness, sadness, anger, ete.) improves with 


426 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


age—existing at an inchoate level in early childhood and becom- 
ing more pronounced in later years—Dr. Honkavaara suggests a 
compromise between European and American thought on this is- 
sue. She contends that in Europe instinctive function plays a major 
role in theories of recognition of expression, while in America a 
learning interpretation commands attention. Neither of these ex- 
treme positions is considered adequate to describe the “real” state 
of affairs. Only a theory which incorporates both the maturational 
and learning approaches can justly represent the actual complex- 
ity of the problem. 

This reviewer is left with the feeling, in spite of the heuristic 
value of the study, that too much has been made of too little un- 
equivocal data. Part of the difficulty—a problem inherent in this 
kind of subjective study in which the subject is required to make 
a verbal judgment based on his observations—stems from the seem- 
ingly inextricable relationship existing between perceptual proc- 
esses and the language used to depict the results of these processes. 
The experimenter acknowledges this problem but does not take it 
too seriously. The difficulty still remains. Is the evidence of lack 
of recognition of expression by most young children unambiguous 
proof that the perceptual process is relatively undeveloped or, 
rather, partly the consequence of inadequately formed linguistic 
habits? The criticism is not necessarily a serious one, since a num- 
ber of the experiments cited in the monograph present additional 
corroborating evidence using elementary instructions and requiring 
only the simplest of verbal responses from the younger subjects. 
Other weaknesses in the study, pertaining to the lack of certain ex- 
perimental controls (eg. intelligence of subjects, standardized 
procedures), might well disturb the more meticulous American 
psychologist. 

On the positive side this reviewer found a considerable amount 
of provocative theoretical discussion in Dr. Honkavaara’s mono- 
graph. One of the most interesting ideas is the hypothesized rela- 
tionship existing between the development of perception in the 
species and that of its individual members, The conjecture is that 
ontogeny recapitulates phylogeny with reference to the develop- 
ment of recognition of expression. 

PATRICK J. CaPRETTA 
Miami University 


Pupil Personnel and Guidance Services by Walter F. Johnson, Buf- 
ford Stefflre, and Roy Edelfelt. New York: McGraw-Hill Book 
Company, 1961. Pp. 407. 

The specific intent of this text is laudable: to place school guid- 
ance functions into the wider context of pupil personnel services. 

Although the purpose of the book is specifie, it is designed for an 


= d 


BOOK REVIEWS 427 


extremely heterogeneous audience. “This book is intended primar- 
ily as a textbook for the basie course in pupil personnel services, 
guidance, or the counseling sequence. Typically the membership in 
such courses is divided between (1) those who are taking the first 
in a series of courses designed to educate them to work as school 
counselors, and (2) those teachers who wish to understand their 
guidance role better and to gain in appreciation of the resources 
open to them through the use of pupil personnel and guidance 
specialists.” 

There is yet another aspect to the authors’ purposes to be men- 
tioned, the role of the classroom teacher. “The teacher serves as 
the primary instrument by which the vast machine of the school 
takes care of the unique needs of the child. To perform this vital 
function, the teacher must learn to use the special skills of the 
pupil personnel worker. These specialists in turn will take over a 
larger role as consultants to the teacher. . . .” 

Aside from the necessary introductory and concluding chapters, 
the book is divided into three major parts. Part two, about fifteen 
per cent of the content, is devoted to a discussion of the child’s 
individual and social development, including a chapter on the “Ex- 
ceptional Child” and “Parents and the School.” Perhaps anticipat- 
ing questions about such material in a text of this nature, the au- 
thors explain that “this section provides a convenient and often 
needed review of basic materials that must be secured by supple- 
mentary reading assignments, which are often inconvenient and, in 
some teaching settings, impossible." 

Part three, about one-third of the content, is devoted to the roles 
of the various pupil personnel workers. The emphasis here is both 
on the role of the pupil personnel generalist, i.e., administrator and 
teacher, and the role played by the specialist. 

Part four, about one-third of the content, is devoted to the tech- 
niques of the pupil personnel worker. Very limited discussions cover 
a wide variety of techniques including, among others, the interview, 
rating scales, observations, standardized tests, scattergrams, and 
case studies. 

Careful reading of the text suggests that the authors fulfilled 
some of their objectives quite well. Certainly the reader is given a 
general overview of the clientele, the workers, and the techniques 
of pupil personnel work. The limitations of such an all encompass- 
ing effort are obvious. 

The role of the teacher is viewed as paramount and the writing 
is designed to enhance that view. It must be mentioned, however, 
that the authors’ views are not likely to be accepted by all people 
involved in counselor education or for that matter in teacher 
education. 

The attempt to provide a text for a very heterogeneous audience 


48  EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


was also successful. However, to this reviewer, those who are pre- 
paring to be high school counselors are less likely to benefit than 
the classroom teacher or school administrator who is looking for 
an overview of the field. 

This book should appeal to those who are interested in a pan- 
orama of materials and in enhancing the role of the classroom 
teacher. It is less likely to appeal to those who are interested in 
the more explicit functionings of pupil personnel workers. 

GILBERT D. Moors 
University of Buffalo 


Problems in Vocational Counseling by Lloyd H. Lofquist and 
George W. England. Dubuque, Iowa: Wm. C. Brown Company, 
Inc., 1961. Pp. xxii + 186. $3.50. 

The authors have selected forty-six “problems” designed to “em- 
phasize important considerations in the counseling process by in- 
volving the student in the actual interpretation of research data 
and by acquainting him with the selected references in the coun- 
seling literature.” This approach reflects their conclusions that 
many counselors have not given serious consideration to the appli- 
cation of present knowledge, as well as that “the vocational coun- 
seling process is essentially the same” even though the setting may 
vary with different groups such as students, adults, and handi- 
capped workers. 

The introduction, which was written by Donald G. Paterson, 
gives an historical overview and stresses that this is an opportune 
time for the appearance of a problem-oriented book. In the Appen- 
dix the authors suggest four possible uses of the problem approach: 
1) supplementing basic text assignments in a formal counseling 
course, 2) organizing such a course around selected problems with 
supplementary readings from traditional texts, 3) use as a supple- 
mental source for discussions in a practicum, and 4) organizing in- 


service training sessions around selected problems. The Appendix _ 


includes, also, a Text Reference Chart correlating the various parts 
of the book to pertinent sections of thirty texts in counseling, stu- 


dent personnel work, guidance, rehabilitation counseling, employ- _ 


ment counseling, and testing. 


The scope of the book is indicated by the seven areas or “parts” - 


used in grouping the problems: 1. the nature of vocational coun- 
seling, 2. mecting individual needs in counseling, 3. use of tests in 
counseling, 4. the counseling interview, 5. work history data in 
counseling, 6. occupational information in counseling, and 7. facili- 
tating the counseling plan, In general each problem is presented by 


a brief introductory statement which in many cases includes data | 


presented in tabular and/or graphic form. The remaining major 


subdivisions of the problem include: 1) one or more questions, 2) - 


BOOK REVIEWS 429 


discussion of each question including what the authors call a “school 
solution,” 3) the source or sources of the data, and 4) for approxi- 
mately half of the problems one or more additional references. 

The variety of problems which are sampled range from rela- 
tively general topics such as “an operational definition of counsel- 
ing” or “ethical practices” to the relatively specific “construction of 
test profiles.” A number of the problems emphasize the contribution 
of research to the information or procedures which the counselor may 
use in the counseling situations—as may be illustrated by the 
problem in which the relative merits of the multiple-correlation 
method of combining test seores is compared with the multiple- 
screen method, as well as by such problems as: “reinforcement of 
verbal behavior" and "interviewer attitudes and interview results." 

The problem approach offers flexibility and appeal to the student 
as an aid in bridging the gap between theory and practice. The 
analyses are stimulating and reflect the education and experience of 
the authors. In some cases, however, the problems are very brief, 
with the entire treatment for a problem covering less than two 
pages. Furthermore, it would appear to this reviewer that the ten- 
dency to have a limited amount of source or reference material for 
various problems is a decided weakness. Even though the bibliog- 
raphy lists 115 references, it should be noted that for thirty of the 
problems only one source is listed and for five of the remaining 
problems only two sources are given. In eighteen or twenty of these 
thirty-five problems no additional reference material is given. The 
authors may have hoped to “pinpoint” their presentations or to 
appeal to beginning students, but the value of the book in the sug- 
gested use for a practicum or an in-service training program would 
be enhanced by more material. 

This is a volume which has considerable potential. The brief, 
succinet presentations of the problems are an advantage in focusing 
the students’ attention, but they also are a disadvantage in the lack 
of comprehensiveness. The book may encourage other authors to 
utilize a similar approach. 

Howarp E. TEMPERO 
University of Nebraska 


Psychology, A Scientific Study of Man by Fillmore H. Sanford. 
San Franeisco: Wadsworth Publishing Company, Ine., 1961. 
Can a text for a first course in psychology be written in such a 
manner that it is usable, and yet an original contribution to knowl- 
edge? No doubt, curriculum research is needed pertaining to Gen- 
eral Psychology. If some radically new approach were tried and 
proved, this would presumably be worthy of classification as & 
contribution to the furtherance of human knowledge. Whatever can 
be said for or against this book by Sanford, it certainly is not 


430 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


partieularly original in over-all organization or format. It stays 
close to the beaten path and appears at first examination to be 
very much any one of a dozen other recent books written for the 
same course. 

On the other hand, Professor Sanford, who once taught at Har- 
vard and now is at the University of Texas, does seem to be trying 
for a “wisdom” approach. He seems to be searching for basic prin- 
eiples, or laws, that have been developed on basie points in psy- 
chology and related fields during the past half a dozen decades. He 
starts his book with an interdiseiplinary, intereultural, and inter- 
organismie approach. 

With this frame of reference, he sets a pattern for the remainder 
of the book, thus leaving himself open for pitfalls into which he 
occasionally falls. In the very first chapter, he gives a swift kick 
to the sacred cow of all church-related institutions—the soul. No 
doubt the soul is difficult, if not impossible, to study using the meth- 
ods of psychology. This is a good point. On the other hand he 
sounds as if he were saying that to believe in the soul is to be 
superstitious. 

Then, because he is talking about very deep and basic subject 
matter, he leaves out such things as hypnotism, the application of 
the subject matter to student problems (such as courtship and 
study habits), and things such as extrasensory perception, which 
also enliven the course at points, even if they weaken the impres- 
sion that psychology is a science like biology and physics. 

Speaking of biology and physics, he does a rather good job in 
those chapters which pertain to electromagnetic spectrums and per- 
ception. On the other hand, he misses a chance here to offer some- 
thing so often absent in these texts—a really good explanation as 
to why these are in the text and course. 

His excellent illustrations are well worked into the discussion. 
The instructor will, however, see some materials he has seen before, 
such as the pictures of the maturing infant by Shirley (1933) and 
the invalid data about the AGCT and occupations. 

Things are brought up-to-date. He mentions the 1960 Binet and 
the research on mother love by Harlow. On the other hand, he 
speaks of the social behavior of insects and fails, conspicuously, 
to bring this up-to-date by implying that their building and social 
behavior is clearly a matter of instincts. 

A class of gifted sophomores, having had courses in biology and 
physics, might find this text stimulating and fairly easy. One 
might doubt, however, that most college sophomores would go away 
from this course saying that it is a snap. On the other hand, there 
are more difficult ones written for use in General Psychology. 

_ Incidentally, this book is strong on statistics, personality, and 
individual differences. The illustrations are coordinated well with 


BOOK REVIEWS 431 


the making of a series of points about the basic statistics of psy- 
chometrics and the basic methods of psychotherapy. 

In summary, a new text has been added to the many already on 
the market for this course. The coverage is “basic” and stays 
within the range of things that most instructors would agree should 
be covered. It also leaves out many that someone teaching the 
course might want to include. In any case, what he does cover, he 
does well. This book is recommended for use in the first course in 
psychology where the objectives of the program are in line with 
what Professor Sanford covers. 

James W. RUSSELL 
De Paul University 


Handbook of Research Methods in Child Development by Paul 
Henry Mussen (Editor). New York: John Wiley & Sons, 1960. 
Pp. vii + 1061. $15.25. 

On the premise that research in child development wants im- 
provement in both quality and quantity, this book presents 22 
chapters of help. Counting the editor, there are 31 authors who, 
together with editorial consultants, comprise a partial who’s who 
in this domain. The work was to emphasize method over findings; 
to be delimited to techniques actually employed; and to exclude 
the clinical and anything otherwise adequately described, such 
as mental testing and the processing of statistical hypotheses. 

In general the delimitations were maintained, to the book’s ad- 
vantage. The desired emphasis on method over content was difficult 
to preserve, and in some instances was probably impossible and 
undesirable. It is necessary to record that some chapters (notably 
on chemical-physiological growth, learning, and linguistics) tended 
to give expositions of their general fields or to list results rather 
than to attend primarily to developmental research design. 

Two implicit delimitations need also to be pointed out, not to 
indieate sin or shorteoming, but as caveat to the naive purchaser 
who might expect the unreasonable. First, the word "child" is taken 
narrowly. One becomes aware, halfway through, that adolescent 
phenomena and example are nearly omitted and that infancy re- 
ceives something less than continuous “treatment,” Second, this is 
not a "handbook" in the model found on engineers! desks. If one 
seeks the do's and don'ts, say, of means of eliciting sympathy or 
the conditions of greatest peer influence, he may eventually dis- 
cover them by reading in the appropriate sections. He will not find 
them via an index address to a neat, systematie, and thorough 
series of how-to's. That this is strength and not weakness comes 
with the awareness that the authors treated method as a function 
of theory and purpose. That is to say, here is a book which will 


432 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


help a scholar to derive or adapt technieal design by understanding 
the larger framework within which he is working. 

The book is a vast, wise, unhurried symposium. Imperfect, it 
abounds in between-chapter redundance only to be expected in so 
large a work (for example, the Rorschach is discussed well in two 
separate places and given more than mention in others). But cor- 
rective editing might have hurt the unity in each chapter's treat- 
ment. The within-chapter editing was excellent: the book as a 
whole is much more fun than chore to read. 

The four early chapters are general, covering theory, design, and 
implieations of developmental study. Except for the one on experi- 
mentation, they are perhaps self-conscious, sometimes labored, but 
well worth study. The experimentation chapter is a more direct 
exposition of problems met with different forms of study at the 
various ages, The second section contains three chapters on bio- 
logical development, of which that on anthropometry is almost 
definitive. The third section has an uneven five chapters on cogni- 
tive growth. The best, within the purpose of the book, is that on 
perception (in this reviewer’s judgment), The above mentioned 
chapter on linguistics is clear and readable; though not the only 
source available to psychologists on the subject, and not seriously 
involved with issues of design, the inclusion of it should serve the 
heuristic purpose. The fourth and fifth sections, with nine chapters 
(none weak) on personality and socialization, contain most of the 
mentioned duplication, but one is reluctant to suggest the knife 
anywhere. The reviewer suggests that the chapters on children’s 
groups and on interviewing are worth special citation. 

Finally, readers of this journal may want some gratuitous com- 
ments pertaining to measurement and the schools. Statistical meth- 
ods as such were precluded. Factor analytic exploration of intel- 
lect's ontogeny was only perfunetorily treated. A chapter on ability 
testing contained standard material available otherwise and per- 
haps unnecessary to have included. More seriously, the school and 
teacher were ignored either as dependent or independent variables, 
nor was there mention of the enormous potential for the retrieval 
of developmental data implicit in the kind of cumulative records 
nowadays being kept. Putting aside such omissions, the book re- 
mains a powerful, scholarly treatise capable of great contribution 
to research, 

C. E. MEYERS К 
University of Southern California 


Human. Psychological Development by Elizabeth Lee and Phyllis 


C. Martin. New York: The Ronald P 1. Pp. 
се ald Press Company, 1961. Рр 


par. 


BOOK REVIEWS 433 


This book is the second of two books which grew out of teaching 
a general education course for college sophomores in a liberal arts 
college. The course was taught jointly by a “psychologically-ori- 
ented biologist” and a “biologically-oriented psychologist.” Al- 
though this text is considered an introduction to the subject and 
“does not presume: previous subject-matter background,” it is rea- 
sonable to suppose that the authors would feel that a course based 
on the material in their first book, Human Development, would 
offer the student a better integrated view of the field in terms of 
the authors’ general theoretical orientation. It is beyond the scope 
of this review or the competence of this reviewer to evaluate the 
first book, Human Development, which according to the authors 
offers an introduction to human biology broadened through the in- 
clusion of embryological, psychological, and social anthropological 
material. An examination of both texts reveals a clear-cut the- 
oretical position employing concepts to interpret and describe hu- 
man behavior and experience which are mechanistic, molecular, 
atomistic, static, and reflexological as opposed to concepts which 
are dynamic, holistic, field theory oriented, emergent, and or- 
ganismie. 

Man is seen as a physiological-psychological being who has both 
& body and a mind, and thus they perpetuate the outmoded dualism 
of mind and body. The authors accept the psychosomatic approach 
but restrict the term to diseases such as peptic ulcer, apparently 
ignoring the fact that the term “psychosomatic” was invented to 
eliminate the dualistic approach to the mind-body problem. The 
psychological orientation is a curious mixture of Wolff's faculty 
psychology, the mental chemistry of the British Associationists, and 
the behavioristie views of John B. Watson. The nouns “brain,” 
“mind,” “personality,” “will,” “intellect,” “self” are treated as en- 
tities. It is stated that the basic units of personality are habits, at- 
titudes, and feelings which are likened to the basic cells of the body. 
The neurological basis of behavior is presented in factual terms 
rather than in terms of theory, logical constructs, or intervening 
variables. The basic unit of behavior and experience is the reflex 
are reduced to neurons and “sensations.” Reflexogenie activity, 
which is alleged to characterize the behavior of a month-old in- 
fant, is distinguished from behavior in the older baby which be- 
comes more "cerebral" and a response from a "sensation." The 
contradictions of adolescent behavior and mood are felt to be best 
understood in terms of Waldeyer's Neurone Theory although the 
name, Waldeyer, was not mentioned nor was the explanation the- 
oretieal Despite the authors’ theoretical orientation, the facts, 
principles, and theories concerning learning, associative or field, are 
omitted. Prenatal development is treated in detail from a biological 
point of view rather than from the point of view more familiar to 


434 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


psychologists. The habit-formers approach to child training is pre- 
ferred to the needs-oriented or clinical approach. Discipline as a 
child-rearing technique is emphasized for the orderly control of 
the personalities and bodies of persons and made analogous to the 
orderly integration of the cells of the body. 

The publishers cooperated with the authors to produce a book 
that is highly attractive in style and format. The print is legible 
and the topics and headings are clearly delineated. The text abounds 
with numerous photographs, tables, and extremely ingenious illus- 
trations that serve well the authors’ didactic purposes. The writing _ 
is clear, concise, and should be easily comprehended by the be- 
ginning student. This was the authors’ goal and they have achieved 
it admirably. 3 

The topical plan for the presentation of the subject matter is 
typical and characteristic of the best standard books covering the 
field of developmental psychology within the authors’ theoretical 
orientation. Individual development is divided into ten stages 
covering the period from conception to old age and each stage is 
discussed topically in terms of physieal growth, emotional growth, 
personal-social growth, intellectual growth, and growth in values. 
The presentation is not rigid, and appropriate topics for each stage 
are introduced. Considerable space is given to the adolescent years, 
and 50 to 60 years of adult life are telescoped into three chapters. 
The material presented is interesting, original, and relevant. The 
interpretations have the flavor characteristic of mental and per- 
sonal hygiene approaches backed by sound, common sense pre- 
seriptions for handling and understanding the developmental tasks 
and conflicts faeing the developing person. The authors present 
biological facts solidly and effectively. Their physiologieal-psy-- 
chological interpretations seem outmoded and obsolete. 

College instructors offering courses in developmental or child 
psychology, which presume a prerequisite course in general psy- 
chology based on standard and widely-used texts, would find Hu- 
man Psychological Development far too elementary and narrow in: 
Scope. Clinically-oriented instructors in the field of child develop- 
ment would obviously reject this text, The text is suitable for those 
who accept the authors’ objectives and basic orientation. 

E. KENNETH CARPENTER _ 
University of Rhode Island 


E 


Guidance—Principles and Services by Frank W. Miller, Columbus, 
Ohio: Charles E. Merrill Books, Inec., 1961. Pp. 426. 
This book is conveniently divided into six parts, each containing 
two or more subdivisions. The intrinsic organization is well done 
with bold type prefaeing major points of emphasis. The work 18 


BOOK REVIEWS 435 


scholarly and documented throughout with extensive bibliographie 
entries following each major subdivision. 

Part One is devoted to the history and prineiples underlying the 
guidance movement. The semantical problems involved in this area 
are well set forth, but the author gives a definition of guidance that 
is not only wordy but also requisite to a number of specified factors. 
The author does much better in distinguishing between guidance 
and counseling. The list of misconceptions concerning guidance 
presents a good springboard for the chapters that follow. The mis- 
conception that “Guidance is the Province of Specialists" is ably 
handled, but unfortunately the misconception that a “little train- 
ing makes for good guidance” is omitted. Even as later set forth 
in Chapter Four, the role of the teacher in guidance is not simple. 
The history of the guidance movement is both concise and adequate, 
encompassing the National Defense Education Act of 1958 and the 
guidance implications contained therein. 

Part Two is devoted to organization, function, and responsibilities 
of the school guidance program. The author is direct and accurate 
but seems to have omitted the role of extra-school agencies in guid- 
ance. The role of the classroom teacher in the counseling program 
is practical, workable, and should be acceptable to most school sys- 
tems. However, the role of the administrator is given but brief at- 
tention. Many counselors will not be so fortunate as to work in 
enlightened systems as implied by the last omission. One of the 
objectives of an introductory course in counseling should be con- 
cerned with the problems of administration as well as soliciting 
“democratic leadership” and “support.” 

The third part of the book is well formulated and includes the 
various traditional means of gathering data. The remainder of 
the chapters in this section deal with the services that should be 
rendered: counseling, information, research and evaluation, The 
author does not burden the reader with countless extracts from tests 
or cumulative records. 

Part Five, “Guidance Services at Work,” appears to be the weak- 
est section to this writer. Several systems are presented, not as 
models but as working situations. Each of these programs has the 
appearance of a paper presented in a graduate seminar and, in 
spite of the short critique by the author following each program, 
does little more than add pages to the publication. Stressing the 
woes of other institutions has little educational benefit beyond 
effecting awareness and arousing sympathy. 

The last part of the book, which is in the form of appendices, 
contains samples of cumulative and permanent record cards as well 
as group guidance materials. 

The most notable contribution of this book is the organization 
and clarity of the writing. It serves the author’s purpose in pre- 


436 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 


senting another book Yor beginning éourses in counseling and — 
guidance. ^ М 
Roy М, тсн ^ . | 
San Fernando Valley State College 


Letters to My Teacher by Dagobert D. Runes. New York: Philo- 
sophieal Library, Ihe., 1961. Pp. 105. 

This is a very short book and it could have been much shorter. 
The various letters show a great degree of repetition and redun- 
daney. From one letter to the next, in spite of -the*considerable de- 
gree of rehashing, the book is still very provocative. It portrays a 
rather widely swinging subjectivist.throwing left field haymakers 
at the realists. At times a pragmatie approach appears and then 
fades into the background. 

No segment of theepresent day school curricular offerings is safe . 
from the wild generalizations that contain enough elements of truth 
to add to the credence of supportive statements. The author seems 
to feel that the “evils” of education beget all the troubles of the 
world. Scholars of the various disciplines should be most interested 
in the attacks made upon the subject matter offered and the method 
of pedagogy employed. 

All in all, this short book, which could have been condensed into | 
. one letter, will provide many hours of conversation and debate. 
Roy M. Етсн 
San Fernando Valley State College 


— 


| 
[ 


EDUCATIONAL and 
SYCHOLOGICAL 


MEASUREMENT 


Editor: G. Frederic Kuder, Duke University 
Associate Editor: John A. Hornaday, Greensboro College 
Assistant Editor: Joan F. Hornaday 
Business Manager: Geraldine R. Thomas 


BOARD OF COOPERATING EDITORS 


Louis D. COHEN M. W. RICHARDSON 
Duke University Richardson, Bellows, Henry and Co. 
Hanorp A. EDGERTON JOHN Н. ROHRER 
' Richardson, Bellows, Henry and Co. Georgetown. University 
- Max D. ENGELHART School of Medicine 
Chicago City Junior Colleges Р. J. RULON 
E. B. Greene Harvard University m 
; Chrysler Corporation Глу SrGEL y 
| J. P. GUILFORD Indiana University 
University of Southern California C. L. SuanTLE 
^ E.F. LINDQUIST Ohio State University 
' State University of Iowa UE TAYLOR у ой 
е У. E. Upjohn Institute 
К ies АГ n T Community Research A. 
' ucational Testing Service THELMA G. THUÉSTONE mn 
t кын age ig NUM. P" University of North Carolina 
| alter Reed Army Institute’ 
of Research HERBERT A. Toors 
SAMUEL Милте Ohio State University 
Educational Testing Service E. С. WILLIAMSON 
to m" B. MICHAEL University of Minnesota 
zl University of California, ВежР”, Woop 
Santa Barbara Columbia University 
Y University of North Carolina ir 
3 


“> Dorotuy ADKINS Woop 


A TWENTY-TWO, NUMBER THREE, AUTUMN, 1962 
اا‎ 


CERTA 
| Med 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


RATINGS SHOULD BE SCRUTINIZED 


J. P. GUILFORD, P. R. CHRISTENSEN 
University of Southern California 
G. TAAFFE 
Los Angeles State College 
AND В. C. WILSON 
Portland State College 


PROBABLY the most serious fault in the application of ratings is 
that their validity is accepted on faith, where investigation might 
Show that the faith was seriously unjustified. The main purpose of 
this article is to call attention to a very striking miscarriage of good 
„experimental intentions in connection with the use of raüngs as 
criterion data for the validation of tests. An important secondary 
purpose is to describe validation procedures that are unusually in- 
formative and that include features that should meet some of the 
difficulties encountered in validation studies. 

The authors have been engaged for some time in attempts to dis- 
cover primary abilities in the general domain of thinking.! We have 
recognized that, in the long run, it is not sufficient to uncover new 
factors, to interpret them as primary thinking abilities, to ascribe 
to them apparent properties, and to point out implied significance 
for thinking performances in everyday life. In order to establish the 
definition and description of a factor and to support its claim to 


1 Under contract N6onr-23810 with the Office of Naval Research. The project 
was directed by the senior author, who was responsible for the writing of this 
article. The validation study on which it is based was primarily the responsi- 

biiy of Gordon. Taaffe. P. R. Christensen and В. C. Wilson took part in 
rlanning the study and its operations. We are indebted to Dr. Edward M. 

Glaser for suggesting the study and for consultations in connection with it. 
The opinions expressed in this article are our own and are not necessarily 
shared by Dr. Glaser or by the U. S. Navy. 


439 


440 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


generality in personality, it is very desirable to obtain independent 
evidence concerning it by research operations other than those of 
factor analysis, at least factor analysis of the same type used in 
discovering the factors. In order to demonstrate a factor's social 
significance, it is necessary to show that it bears relationships to 
everyday life performances. 

An opportunity arose in 1952 for a validation study of some of 
our factor tests in relation to the performance of research scientists 
in a California industrial organization.? Thirteen tests, ten of which 
measure certain of the factors of reasoning and creative thinking 
known at that time, were administered to 53 scientists.) It was be- 
lieved that some or all of these factors might well play significant 
roles in the performances of research scientists. The scientists were 
rated by supervisors who were instructed in rating procedures by us 
and who made the ratings under our supervision. Ratings were made 
on eight defined traits (factors) and on over-all research performance. 

Our validation problem was a double one: (1) to determine 
whether a test score for a factor will correlate with evaluations of 
individuals with respect to that factor as evidenced in the perform- 
ance of a scientist, and (2) to determine the relative importance of 
each factor in over-all scientific performance. The first problem is à 
matter of validating the factors, as such, and it calls for the raters 
to evaluate each ratee on each factor, In a relatively small sample, 
this type of validation is the more likely to yield significant correla- 
tions of each test with an over-all rating of performance. 

The reason for the last statement, which will be obvious to some 
readers but not to all, is that any over-all criterion measure is 
almost certain to be factorially complex. Any particular factor is | 
likely to have only a small loading in the criterion. Let us say that 
the test of a certain factor has a loading of .50 for the factor, 4 
value which is rather typical. Suppose that the over-all criterion | 
has a loading of .30 in the same factor. If the correlation between | 
this test and this criterion is limited to their covariance in this one | 
factor, the validity coefficient would be .5 X .3, or .15. It would 
require a sample of 150 to 200 for a correlation coefficient of this size 


? From the beginning of the study this organization requested to remain 
Mor epu bs 

* The number tested was more than 70. i lue to in- 
sufficient rating data. . The shrinkage to 53 was due | 


| 


CHRISTENSEN, TAAFFE AND WILSON 441 


to achieve significance at the 5 per cent level. Tf, however, we choose 
а special aspect of the criterion that emphasizes the factor, a meas- 
ure of this aspect should have a substantial loading in the factor. If 
the loading for this factor is .50, for example, the validity coefficient, 
if limited to this one factor, would be .25. A sample of 60 would be 
sufficient to indicate genuine correlation at the 5 per cent level of 
confidence. For these statistical reasons, validation faetor by factor 
is sometimes possible where validation of relatively pure factor tests 
against an over-all eriterion would fail to give decisive results. 

Although our sample was too small to offer much possibility of 
signifieant correlations between any test and an over-all criterion, 
we asked for a rating of over-all performance for whatever informa- 
tive value it might have. In a complete validation study, it would 
be desirable to take the double approach; correlating tests both with 
special criterion measures and with a composite or general evalua- 
tion whose ingredients are fully representative and appropriately 
weighted. Although the use of special criteria only avoids the weight- 
ing problem, eventually the weighting problem must be faced. It is 
our belief that the ingredients are best known in terms of factors of 
personality and that a rather large sample is required to give us a 
picture of their relative weights, 

The success of the approach using specialized criteria, one for 
each factor, depends upon the ability of the raters to make the 
necessary distinctions among factors and upon their having made 
the appropriate observations upon which to base their judgments. 
If a factor to be rated is “real” and if there are overt manifestations 
of it in the sphere of activity of the criteria, observers should be 
able to furnish judgments that have some variance in the factor. 
These considerations are basic to assumptions in our study, Our 
main hypothesis was that a test in which a certain factor predomi- 
nates will correlate higher with ratings of that same factor and will 
correlate little or none with ratings of other factors. This assumes a 
high degree of independence of the factors. 

In developing rating forms, a conference was held with two 
company executives and a consulting psychologist retained by the 
company. The various factors of reasoning and creative thinking 
were discussed in relation to observable performances of the em- 
ployees.* Names and descriptions of the factors were framed in 


TOR ee 
* For information concerning the thinking factors involved see Green, Guil- 


442 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ways that it was believed the rating supervisors would understand, 
For example, the factor of “ideational fluency” was presented under 
the title “Quantity of ideas,” with the following description: “When 
this person is faced with a question or problem he immediately has 
a lot of ideas about it. It does not matter for this trait whether the 
ideas are any good or not. It is the number of ideas per unit of time 
that counts.” 

The eight factors and their (usually) modified trait names were: 


Factor Trait Name 
A. Ideational fluency Quantity of ideas 
B. Originality Quality of ideas 
C. Logical evaluation Evaluation of ideas 
D. Adaptive flexibility Adaptive flexibility 
E. Spontaneous flexibility Versatility of ideas 
F. Sensitivity to problems Sensitivity to problems 
G. General reasoning Analysis of problems 
H. Redefinition Improvising 


The rating form was a graphic scale with each trait on a separate 


page, on which there was room for the names and linear ratings of 
a number of ratees. Five supervisors rated from 14 to 22 employees 


each, with some overlapping. 

Eleven thinking-factor score variables were correlated with each 
of the nine rating variables, both with rater-composite ratings and 
with ratings within individual raters. Of the 99 correlations when 
composite ratings were used, 13 were significant beyond the 5 per 
cent level of confidence and 1 beyond the 1 per cent level. The pat- 
tern of significant correlations was so irregular as to provide no sup= 
port to the main hypothesis that tests of a factor would correlate 
relatively higher with ratings of the same factor. In fact, only one 
of the faetor tests correlated significantly with the trait hypoth - 
sized to be most related to it. This was a correlation of .28 for the 
score and rating of the factor "spontaneous flexibility." 

These results would have been very discouraging except for ¢ 
tain other facts. One suspicious result was that the eight trait ra 
ings (composites) had an average intercorrelation of approxi 
mately .70 (with a range from .59 to .89). On the one hand, this 


ford, Christensen, and Comrey (1953 i i i 
Lewis (1964). y (1953) and Wilson, Guilford, Christensen, 


CHRISTENSEN, TAAFFE AND WILSON 443 


some indication of the reliability of the composite ratings. On the 
other hand, it indicates that the raters were very strongly confusing 
the traits with one another, perhaps to the point where there was 
little real trait differentiation. How much these intercorrelations 
rest upon actual intercorrelations of factors is unknown. Some of the 
covariances, for example, can very likely be attributed to systematic 
variations in leniency errors of the different raters, It is very un- 
likely that the factors actually intercorrelate to the extent exhibited 
by the composite ratings. 

A study of the intercorrelations of test scores with trait ratings by 
individual raters was even more revealing. Although the samples on 
which these correlations were based are quite small, the trends in 
the correlations point clearly to certain hypotheses. The general 
finding was that no matter what trait a rater was asked to rate, his 
ratings of the same employees tended to correlate about the same 
with every test, high with some tests, low with others. For example, 
one rater's correlations might be in the neighborhood of .50 with the 
test of a certain factor, .30 with the test of another factor, and close 
to zero with other tests. Examples involving the tests Logical Rea- 
soning (for the factor of logieal evaluation) and Verbal Analogies I 
(for the factor of eduction of conceptual relations) are given in 
Tables 1 and 2. 


TABLE 1 


Correlations Between the Logical Reasoning Test and 
Tratts as Rated by Individual Raters 


Rater 
ee AE 

Trait Ts U Y Y Z Me 
Еа Eee 

A —.17 —.20 .14 —.23 .88 —.01 

B —.27 .07 .18 —.81 .63 .08 

Ce —.33 —.19 .30 —.05 .52 .06 

р —.12 .14 .52 — :81 .75 .25 

Е —.17 —.27 .13 —.43 .18 —.12 

F —.32 —.25 .26 —.36 .85 —.07 

G —.40 —.16 .35 —.27 .73 .09 

H —.81 17 .28 —.15 .58 13 

Over-all —.29 —.22 .28 —.36 .52 —.01 

M, —.26 —.10 .27 —.28 .54 

LU AOT ee 06000000 00 


® N's for raters were: T—22; 0—14; V—14; Y—22; Z—17. 
b М, = mean r for a rater and М, = mean r for a trait. Fisher's z transformation was used. 
* Trait C should have correlated highest with this test. 


444 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 


Correlations Between the Verbal Analogies I Test* 
and Traits as Rated by Individual Raters 


A .08 27 .— .40 Е Or 16 
B 04 RBA EGO RENTA .07 .20 
[o 14 31001 72 .26 .54 .40 
D 23 710 61 .06 118 .34 
Е .02 .24 .44 —.34 —.46 .02 
F 17 115) 70 $4 —.09 .23 
G 12 .16 168 14 14 27 
н —.06 ll .49 Ring) —:06 5t. 
I 342 .—.25 ^ 08 .05 12 л? @ 
м. .09 16.62 .05 .04 
2 ООЛАК ы К ou 


* This test emphasizes a factor that was not specifically rated. 


Two hypotheses are suggested by these results. One is that each 


regardless of trait. According to this hypothesis, each rater's “halo 
effect" stemmed from bias toward one or two factors, his observa 
tions and evaluations of which determined his ratings on all traits. 
For example, from the data in Table 1, it would appear that rater Z 
tended to evaluate the ratees in an order consistent with their stand- 
ings in the factor of logical evaluation, His ratings correlated from 
-18 to .75 with scores for the various factors, with a mean of .54. Six 
of his nine correlations were significant beyond the 5 per cent level 
and so was the mean, From Table 2 it appears that rater V judged 
scientists in the order of scores of the factor of eduction of con 
ceptual relations, regardless of trait, while other raters judged scien 
tists in more or less random orders with respect to that factor (see 
Table 2). 

The alternative hypothesis is that these patterns of correlation 
levels are determined by systematic rating errors that just happen 
to correlate with selected factor tests, One fact that supports thi 
hypothesis is that statistically insignificant correlations for a ratel 
are often of similar size and algebraic sign. Note the uniformly 
negative correlations for raters T and Y in Table 1, all of which art 
insignificant. Although we did not take the trouble to intercorrelate 
the ratings of trait variables for each rater separately, it can D 


CHRISTENSEN, TAAFFE AND WILSON 445 


rather confidently predicted that they would be generally high, in 
line with the intercorrelations of composite ratings, Such strong 
intercorrelations of a rater’s trait variables would ensure that all of 
them would tend to correlate similarly with each test. These con- 
sistencies might reflect meaningful halo effects, but the bias in the 
direction of one or two factors may be merely coincidental. Another 
indication favoring this hypothesis is that although the communality 
of the Verbal Analogies Test is of the order of 40, in previous factor 
analyses, this test correlates as high as .81 with rater V’s ratings of 
one trait. 

Whatever hypothesis concerning the individual rater biases is 
correct, the ratings are apparently worthless as criterion measures. 
These findings, though not entirely unprecedented, dramatically 
suggest that many other ratings are beset with similar biases, even 
when obtained under what appear to be favorable conditions. The 
investigator would be oblivious to such defects unless a searching 
examination were made of the ratings. High reliability coefficients 
would not necessarily indicate the absence of biases, since biases 
may be partly responsible for the high reliability. 

In a study by Guilford and Holley (1949) we find another exam- 
ple of the miscarriage of instructions given to raters. In that study, 
when raters gave judgments of esthetic objects under different in- 
structions, it was found by factor analysis that instructions had 
practically no effect. The raters apparently emphasized the same 
features in making their ratings, regardless of instructions, and the 
factors obtained were generally not in line with the instructions. 
For example, some raters emphasized in common a preference for 
romantic or adventurous scenes, and still others favored other 
themes. The Guilford-Holley study is cited also because it demon- 
strates how factor analysis may be utilized in discovering what is 
going on in the minds of the raters pertaining to their criteria of 
judgment. 

Incidentally, one important implication of the present study, as 
well as that of Guilford and Holley, is that ratings are very poor 
material to use alone as variables in factor analyses where the pur- 
pose is to discover primary traits of personality. The factors ob- 
tained from-trait ratings are likely to reflect what is in the minds 
of the raters rather than basic traits of personality. Actual traits 
are confused because of logical errors and failure to make distinc- 


446 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tions. In spite of definitions of terms, provided by the investigator, 
raters often harbor stereotyped misconceptions. As may have been 
true of raters T to Z in Tables 1 and 2, raters may be judging unde 
the influence of halos determined by one or another of the more 
observable traits or more favored traits. For these various reasons. 
factors found from ratings may represent largely semantic ideas, 
misconceptions, or deseriptive preferences rather than psychological 
dimensions of personality. | 

To return to the general problem of practical validation of factors, 
there is another difficulty worth pointing out. It stems from the fact 
that very few tests are factorially pure. On the one hand, this favors 
correlations of significant size. On the other hand, it leaves us with 
ambiguous conclusions. Suppose a test measures two factors sub- 
stantially and that its correlation with a criterion is statistically 
significant. Which factor is responsible for the correlation? Are both 
responsible, and, if so, to what extent is each responsible? The best 
way to obtain a clarifying answer to these questions is to apply & 
factor analysis to the test and the criterion in the same matrix with 
other variables. This statement holds for objectively measured 
criteria as well as for ratings. Such an analysis calls for a careful 
selection of tests and of criterion measures as would be true for am 
factor analysis, if we want an interpretable solution. Test-score 
variables put into the analysis to identify factors will probably have 
to carry the burden of interpretation of factors found in ratings and 
other criteria. 

The current practice of validating tests without reference to fac- 
tors, of course, avoids the problems cited in this article. But it does 
so by evading the issue of validity of criteria. In validation studies, 
as in all scientific research, the best policy is to know all we possibly 
can about all the data we have, Even then, we shall often know 100 
little. There are doubtless Some investigators in practical situations 
who do not care to know their variables in analytical terms. The 
scientifically oriented and thorough investigator will care. 


Summary 
Ratings of proficiency of personnel are frequently used as criteria 
against which to validate test scores, Such ratings, and those used 


for other purposes, are usually assumed to be valid; it is assumed 
that we know what they measure, 


CHRISTENSEN, TAAFFE AND WILSON 47 


Evidence is presented to indicate that ratings may often measure 
variables other than those it is intended that they measure or those 
itis believed that they measure. 

It is recommended that wherever possible, a validation study, 
particularly one that includes ratings, should be carried out in the 
form of a factor analysis, with a number of suitable marker tests 
included in the battery in order to identify all likely common 
factors. It is strongly urged that ratings of abstract traits not be 
used in factor analysis where the objective is to discover new 
factors. 


REFERENCES 


Green, В. F., Guilford, J. P., Christensen, P. В. and Comrey, A. L. 
"A Factor-Analytie Study of Reasoning Abilities.” Psycho- 
metrika, XVIII (1953), 135-100. 

Guilford, J. P. and Holley, J. W. *A Factorial Approach to the 
Analysis of Variances in Esthetic Judgments." Journal of Ez- 
perimental Psychology, XXXIX. (1949) , 208-218. 

Wilson, R. C., Guilford, J. P., Christensen, P. R. and Lewis, D. J. 
“А Factor-Analytic Study of Creative-Thinking Abilities.” Psy- 
chometrika, XIX (1954) , 297-311. 


EDUCATIONAL AND eis um MEASUREMENT 
Vor. XXII, No. 3, 


SUBTLE, OBVIOUS AND STEREOTYPE MEASURES 
OF MASCULINITY-FEMININITY! 


ROBERT C. NICHOLS? 
Purdue University 


MasculINITy and femininity pose interesting measurement prob- 
lems. In contrast to the definition problems encountered with most 
other intuitive psychological constructs, there is a readily available 
external eriterion upon which to base an operational definition. 
Clearly, masculinity must involve behavior that is similar to that | 
typically observed in men and femininity must involve behavior 
that is similar to that typically observed in women. Most mascu- 
linity-femininity (MF) scales, including those of Gough (1952), 
Guilford (1936), Heston (1949), and Terman (1936), have been 
based on this operational definition of sex difference. 

Scales based on items showing sex differences have not been found 
to have the desired high positive correlations with each other and 
Heston (1948) has suggested that MF items based on interests seem 
to form one cluster, while those based on personality characteristics 
seem to form another. Webster (1956, 1957) was disappointed with 
the low internal consistency of items selected on the basis of sex 
difference, and he found that he could improve reliability consider- 
ably by dividing MF items into three scales on the basis of item 
content, Two of the scales so constructed were found by Webster to 
have a negligible correlation with each other in a sample of a single 
sex. (A sizable negative correlation between these scales was found 
in the present study.) 

One possible source of this lack of internal consistency, in per- 


1 This study was supported in part by a grant (M-2734) from the National 
Institute of Mental Health. 
2 Now at National Merit Scholarship Corporation, Evanston, Illinois. 


449 


450 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


sonality inventory items showing sex differences, may lie in the dis- 
crepancy between self-description and other forms of behavior. 
There are socially prescribed sex roles for many of the behaviors 
inquired about by inventory items. Since males in general tend to 
report behavior socially preseribed as masculine more frequently 
than females and vice versa, many items will show a sex difference 
in response largely because of these social conventions. Thus, indi- 
vidual differences in response to these items within a single sex will 
reflect the degree to which a person’s self-description conforms to 
the socially prescribed sex role for his sex. On the other hand, re- 
sponses to inventory items are in themselves a sample of behavior 
and, if a person responds in a manner typical of one sex, he is by 
that fact showing masculine or feminine behavior. In the typical 
MF scale these two influences are confounded since most items on 
which there are sex differences inquire about behaviors for which 
there are socially prescribed sex roles, Yet, there is no a priori rea- 
son to expect a strong relationship between the sex role one attrib- 
utes to himself and the actual similarity of his behavior to that 
typical of one sex or the other. 

The purpose of the present study was to isolate the aspect of MF 
concerned with actual behavioral similarity to a given sex from the 
aspect concerned with self-attributed sex role, to develop measures 
of each, and to study their relationship with each other and with 
several MF scales in current use, 


Procedure 


Masculinity-femininity items were taken from the MMPI (Hath- 
away & McKinley, 1942), the Heston Personal Adjustment Inven- 
tory (Heston, 1949), the VC Attitude Inventory (Webster, 1957), 
the Guilford-Martin (Guilford & Martin, 1943), the California 
Psychological Inventory (Gough, 1957), from Gough (1952), and 
a number of additional items were written based on MF items re- 
ported by Terman and Miles (1936) and from the Strong Voca- 
tional Interest Blank (Strong, 1938). The Guilford and Heston 
items were paraphrased to change the yes-no response to a true- 
false response to conform with the item form of the other items. The 
resulting 356-item MF test was administered to 100 male and 100 
female introductory psychology students at Purdue University, and 


phi coefficients were calculated to show the ability of each item to ^ 


ROBERT C. NICHOLS 451 


differentiate the sexes. The same 356-item test was next adminis- 
tered to two additional classes of 48 and 64 introductory psychology 
students of both sexes with special instructions. These subjects were 
told that all 356 items had been found to differentiate males from 
females. One group was asked to mark the answer given most fre- 
quently by males while the other was asked to mark the answer 
given most frequently by females. Phi coefficients between these two 
groups were computed to show the amount of agreement concerning 
what response is to be expected from each sex.? The items were then 
plotted on a graph with the sex difference phi coefficients represented 
along the horizontal axis and the faked or stereotype phi coefficients 
on the vertical axis. This seatterplot is shown in Figure 1. 


м в 3 


3 23 
— SEX DFE $ — 


OBVIOUS 
ITEMS 


. 


. 


Figure 1. Seatterplot of the sex difference and stereotype phi coefficients of 
the 356 MF items. 


3 These stereotype phi coefficients were obtained by Barbara Udell (1959). 


452 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


On the basis of their position on the scatterplot shown in Figure 1, 
items were selected to compose four experimental scales:* 


(a) Items falling in the upper right and lower left quadrants are 
those in which there is a true sex difference that is generally rec- 
ognized and agreed upon by college students. These items were 
used to form an Obvious MF scale. 

(b) Items falling along the horizontal axis and some distance 
from the origin are those in which there is a true sex difference 
that is not generally known to college students. These items (al- 
though fewer than expected) were used to form a Subtle MF 
scale. 

(c) Items falling along the vertical axis and some distance from 
the origin are those for which no true sex difference exists, but 
for which a sex difference is generally assumed by college stu- 
dents. These items were used to form an MF Stereotype scale. 
(d) Items clustering around the origin are those for which a sex 
difference neither exists nor is presumed by college students to 
exist. Since Webster (1957) has found acquiescence response set 
to be highly correlated with some aspects of MF, these items 
were all scored true as a measure of response set independent of 
either true or stereotype MF. 


The responses of the 100 male and 100 female subjects mentioned 
above were scored for the experimental scales and for the MF scales 
from which the items were taken, and correlation matrices were 
computed for the male and female subjects separately. These corre- 
lations are shown in Table 2. 

The correlations between the experimental scales shown in Table 2 
may be partly spurious because they are calculated on the same 
sample used to select the items, Therefore the MF test was admin- 
istered to an additional sample of 111 male and 102 female intro- 
duetory psychology students. The correlations between the experi- 
mental scales and point biserial correlations with sex based on this 
cross-validation sample are shown in Table 3. Cross-validated re- 


* The items composing the Subtle, Obvi А 
ч , про ; ious, and Stereotype scales, along 
with the item statistics, have been deposited with the American Documenta- 


ог $125 for photocopies from Chief, Photoduplication Service, Library of 


ROBERT C. NICHOLS 453 


liabilities were caleulated using the Kuder-Richardson Formula 21 
and are shown in Table 1. 


The Subtle Scale 


Inspection of Figure 1 reveals a very high relationship between 
the ability of items to differentiate the sexes and the general aware- 
ness that the sex difference exists. Since there are very few items in 
which there is a true sex difference that is not known to under- 
graduate students, the intention to construct a subtle MF scale was 
largely frustrated. There were only 30 items with a sex difference 
significant at the .05 level and a stereotype difference not significant 
at the .05 level. Since none of these sex difference phi coefficients is 
very large, it is to be expected that the scale contains many chance 
items. KR-21 reliability computed on the same sample of 200 sub- 
jects used in constructing the scale (and thus spuriously high) was 
. -48. This reliability shrank to .14 on cross-validation. The cross- 
validated reliabilities in samples of a single sex were .07 and .10 for 
males and females, respectively. The point biserial correlation with 
sex was 49 in the original sample, which shrank to .28 on cross- 
validation. 

The Subtle scale is thus far from being a satisfactory scale for 
measuring the subtle component of MF. However, because of its 
significant, though quite low, validity, the Subtle scale was in- 
cluded in subsequent analyses for possible suggestive results. 


The Stereotype Scale 


As can be seen from Figure 1, there were many items on which 
there is good agreement on a sex stereotype and which show no sex 
difference. The Stereotype scale, composed of the 61 items with 
stereotype coefficients greater than --.30 and sex difference coeffi- 
cients less than +.10, was found to have a KR-21 reliability of .69 
in the total original sample, and reliabilities of .73 and .67 in the 
original male and female samples, respectively. The cross-validated 
reliabilities shown in Table 1 are quite similar to those values. The 
correlation with sex of .06 found in the original sample increased to 
-44 on cross-validation. This is considerably higher than was ex- 
pected and further studies to lower this correlation with sex are 
needed. 

The item content of the Stereotype scale reveals that females are 


454 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


considered to be more introspective and dreamy, less practical and 
able to make decisions, more emotional and excitable, more sensitive 
and easily hurt yet at the same time more patient and understand- 
ing, more religious, more suspicious, more conforming, more moral 
and to have more neurotic symptoms than men on items that 
actually show little sex difference. Some of these content categories 
do not appear at all in items showing large sex differences and indi- _ 
cate real disagreement between the stereotype and actual response. 
In this category are the neurotic symptoms, the suspiciousness, and 
the morality. The other content categories also appear among the 
items showing sex differences, and in these cases the stereotype is in 
error by exaggerating the extent of the sex difference in a particular 
trait. 

The item content of the Stereotype scale is interesting for the 
light that it sheds on the popular conception of sex roles. The exact 
meaning of these items is ambiguous, however, since it is not known 
whether they represent popular misperceptions of behavior typical 
of each sex or whether they are indeed accurate perceptions of true 
sex differences in behavior which do not show up as sex differences 
in item response due to lack of veridical self-description. 

This difficulty in interpreting item content should not confuse the 
meaning of the test score, however, since an attempt to give feminine 
responses, for example, would tend to raise the stereotype score. 


The Obvious Scale 


As can be seen from Figure 1, college students are correctly aware 
of almost all the sex differences found in these items. The 58 items 
with stereotype phi coefficients greater than +-.60 and sex difference 
phi coefficients greater than -+.20 were included in the Obvious seale. 
This scale was found to have a KR-21 reliability of .88 in the total 
original sample, which dropped to .57 for males and .75 for females 
in the original samples of a single sex. These coefficients are similar 
to the cross-validated values shown in Table 1. The point biserial 
correlation with sex was .82 in the original sample and .85 on cross 


validation. 
Response Set Scale 


Webster (1957) has reported very high correlations between 8 
response set scale composed of items all scored true and his MF IM 


ROBERT C. NICHOLS 455 


TABLE 1 
Cross-Validated Reliabilities (KR-21) of the Experimental Scales 
پټ ڪڪ‎ ŘŘŮ—Á— 


Sample 
Male Female Both Sexes 
Scale N = 111 N = 102 N = 213 
ee EEE I E ННН 
Subtle MF .07 10 14 
Stereotype MF .66 ‚70 71 
Obvious MF .68 .65 .94 
Response Set .00 .00 24 


—— imc ee eee 
or “feminine sensitivity” scale. Webster recognized that much of the 
correlation was due to similar or overlapping content, but just how 
much of the covariance is due to this source is not known. In the 
arbitrary selection of any group of items for the construction of a 
response set, seale, one will get an unknown amount of interaction 
with the content of the items. However, the items falling near the 
origin in Figure 1 have been demonstrated to be independent of any 
real or presumed sex difference, and it was reasoned that a response 
set scale constructed from these items could not have a spurious 
correlation with MF due to common content. 

The 39 items with neither the stereotype nor the MF phi coeffi- 
cients reaching the .05 level of significance were all scored true as a 
measure of acquiescence response set. The resulting scale had near 
zero internal consistency in both the original and cross-validation 
samples. The failure to find internal consistency in this scale, when 
items were selected on the basis of no sex stereotype or sex differ- 
ence, may suggest that previous response set scales are confounded 
with content and gain internal consistency in this way. In many 
personality scales the true response refers to some neurotic com- 
plaint or difficulty more frequently than does the false response. 
"Thus, if all items were keyed true and the subsequent scale item 
analyzed for homogeneity as was done by Webster (1956), the re- 
sulting scale might be composed essentially of neurotic complaint 
items, as are heavily weighted on the present Stereotype seale (79% 
of the stereotype items are scored true). In support of this possibility 
is the finding of a high correlation between the Stereotype scale and 
MF III (.53 for females and .61 for males), the scale found by 
Webster to correlate with response set. Surprisingly enough the 


456 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


"ѕәүвэв 19030 OY} Пе UI OSS OT SI SU OUTUTUIO] 919 601008 AI JEY} ов PIJAIT SEA OTIS поузәүү OU JO BULLE JO MOHAN +e 
"euonwpanroo oq шогу podio S[WUID( e 


(ооб = N) 
$8 90 (6b T 80 6, oF ZR 8h 6 Lb l4 xəs GRIM "4; “ZT 
ӨЛ? Ж Тр T8. 80 8? 26 $89 99 ge OL 90°72 gorge йл Sg AW 8n01AQQ "TT 
9g E19 УС 21:500 ТЕВЕ ТС Set 6k 99°9 SGL/6 0172 9696 AN edjoezajg “OT 
з و‎ SI- 8 Fé = 20- 0I SI TE c0 FUE 76 LT ТРЕ 62° FL ANW PRPS `6 

(auan 

-isuas aujururay) 
СР. 99 0 e 90 #6 % 6 I 9 zoe #706 сз 9091 TIT AW 1899M ^8 

(Aqratssed) 
DW 95— 90. = 06— ST $0 90 EEE рг 9/v /8°% GEG 99'SI II JW PM ^2 

(Ауцєпогцупәлпоә) 
Mame IQ. 90 — FL EE $, ғ 99 8 е 993 TO'ZI 062 #6 IJ SGAM '9 
OF 09  Á 10— 9F 10 TO Io 66 ££ 6L9 TOE S£'9 19°9% eed чозвәҢ ‘6 
тб GO EG 1 98 9g os © LF 22°% 06°98 TF 88°92 JN q3nop Ф 
Uc uem Bose SE TQ 9p 85 64 6g 9g zee 16° % She FEST eT IdO % 
9% 60- 0c 0 2 79 CI 8с ze 10 Lv Sse Teh Gelt JW IdWW = 

c8 [42 пе oF c0 9б FF 8r 98 61 98'€ 9€ 8I 966 616 W unie 
-P1ogmo ст 
f —À——Á—À d EWS СЕ са? СЕ с дг a 1 Ger c I iE 

п OL 6 8 L 9 $ 4 [3 e cr ‘as WW ‘as WN eros 
(001 = N) [suo2etp oy элоде sg apua 007 = X OOI = N 
(001 = N) [euo3erp oq soroq s,S EW 821042] samy 
48401]0]9440;) 
гг с پپپ پپپ پپپ‎ 
sepoog [Jy эч} биошү suoivj2410;) 
€ WISVL 


QU ROBERT C. NICHOLS 457 


present response set scale, in spite of its lack of internal consistency, 
was found to correlate .40 for females and .25 for males with MF III 
in the original sample. These were the highest correlations found 
for the response set scale with any of the other measures, 


Correlations Among MF Scales 


Table 2 shows the intercorrelations among the various MF meas- 
ures used in the present study. The numbers above the diagonal rep- 
resent correlations calculated on the original sample of 100 male 
subjects, and those below the diagonal on the original sample of 
100 female subjects. The finding of previous investigators of gen- 
erally low correlations among MF scales is clearly confirmed by 
these figures, 

Correlations of the three experimental scales constructed in this 
study with Webster’s three content scales reveal an interesting 
parallel between MF I and the Obvious scale, MF II and the Subtle 
scale, and MF III and the Stereotype scale. The content of these 
scales corresponds rather well with the content of Webster's scales, 
the Obvious scale being heavily loaded with items reflecting conven- 
tional occupational and sex roles, the Subtle scale containing items 
reflecting lack of aggressiveness, and the Stereotype scale concerned 
with fantasy, introspection, and neurotic trends. Thus, the division 
of MF items on rational grounds by Webster seems to have isolated 
somewhat the same clusters of items as those found by empirical 
means in this study. 

The correlations between the experimental scales shown in Table 
2 are perhaps spurious because these correlations are based on the 
same sample that was used in selecting the items for these scales. 
Table 3, however, shows the correlations between the experimental 
seales based on the cross-validation group. Since there is no item 
overlap among the scales, these correlations represent unbiased esti- 
mates of the correlations among the traits measured by these scales 
in this population. Of special interest in Table 3 is the negative 
correlation (—.44 for females and —.49 for males) between the 
Subtle and Stereotype scales. This negative relationship is espe- 
cially interesting since both scales correlate positively with sex. If 
the reasoning behind the construction of these scales is correct, the 
question arises concerning why there should be a negative relation- 
ship between the similarity of one’s behavior to that typical of the 


458 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT f 


TABLE 3 519; D 
Correlations among the Experimental Scales in the Cross-Validation Sample 


Correlations 
Male §’s below the diagonal 
(N = 111) 
Males Females Female S's above the 


N =111 N = 102 diagonal (N = 102) 
Scale Mean 8.0. Mean &8.D. 1 2 
1. Subtle MF 15.68 2.83 17.33 2.84 —.44 —. 
2. Stereotype 
MF 24.02 6.06 30.27 6.92 —.49 
3. Obvious MF 16.17 5.93 35.55 6.16  —.08 19 
4. Response Set 16.41 2.79 17.41 3.03 —.21 .10 
5. тыз. With 
Sex .28 .44 
(N = 213) 


opposite sex and the вех role one attributes to himself in self-deserip: 
tion. Since the intuitive concept of psychological femininity i 
clearly a unitary one, one wonders which of these negatively corre 
lated aspects of sex role corresponds most closely to the construet 
The answers to these important questions will require additione 
data and more adequate subtle and stereotype scales than the pres 
ent ones, However, within the limitations of the present data some 
suggestive interpretations can be made. 

The notion of defensiveness in self-description is a possible expla 
nation for the negative correlation between Subtle and Stereo 
scales. It may be that those whose behavior is similar to that typ 
of the opposite вех are sensitive about this and tend to exaggerate 
behavior felt to be typical of their own sex in those aspects of 
havior that are under conscious control—the clinically fami 
“weight lifter’s syndrome.” If defensiveness is responsible for t 
negative correlation under discussion, it would be expected to 
operate only in those subjects whose behavior is most similar 4 


scatterplots of the correlation between these two scales were 
strueted for male and female subjects separately. The expect 
eurvilinear relationship between the two scales is clearly evident! 
the female sample. The 25 per cent of the female subjects with ti 
lowest (most masculine) scores on the Subtle scale uniformly B 


ROBERT C. NICHOLS 459 
high (feminine) scores on the Stereotype scale, and there is little 
relationship between the two scales in the other 75 per cent of the 
subjects. There is no indication of such a curvilinear relationship 
among the males. There is, however, a single case among the origi- 
nal 100 males that raises an interesting question. This subject ob- 
tained the highest (most feminine) Subtle MF score of the group 
(4.25 from the mean) and also obtained the highest score among 
the males on the Stereotype scale. If the defensiveness hypothesis is 
a correct explanation for the negative relationship between Subtle 
and Stereotype scales, then the clinician would expect to find a few 
very feminine males, the overt homosexuals, who are not defensive 
about their sex role but rather attempt to conform to the stereotype 
of femininity. Thus, in a group of overt homosexuals the Subtle and 
Stereotype scales may be positively correlated. 

The correlations shown in Table 2 of various MF scales with the 
Subtle and Stereotype scales may help to clarify the meaning of 
these two experimental scales. In general the scales that have been 
constructed on the basis of sex differences, namely the Guilford, the 
Heston, and the Gough scales, show sizable correlations with the 
Stereotype scale and near zero correlations with the Subtle scale in 
samples of both sexes. On the other hand, the MMPI MF scale, 
which is composed of items found to differentiate homosexual from 
nonhomosexual males, correlates higher with the Subtle than with 
the Stereotype scale. The CPI Fe scale, which is a subset of the 
Gough items, seems more similar to the MMPI scale in this pattern 
of correlations than to the other sex difference scales. This is diffi- 
cult to interpret since it is not clear from the CPI manual how the 
subset of items was selected from the longer scale. 

The above reasoning, based on admittedly inadequate evidence, 
suggests that the subtle items may come closest to measuring the 
intuitive concept of masculinity-femininity, and that the Stereotype 
seale, because of defensiveness, may come closer to indicating the 
opposite in a normal group. One wonders what this may mean for 
MF scales in current use, which are composed mainly of obvious 
items presumably embodying both subtle and stereotype com- 
ponents. The Obvious scale has a sizable correlation with the 
Stereotype scale in both sexes and a near zero correlation with the 
Subtle seale. If one is willing to assume that these correlations are 
not due to defects in the scales, it seems probable that the response 


460 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


to the obvious items is more determined by the stereotype than the 
subtle component. Thus, present MF scales may serve very ade- 
quately in identifying overt homosexuals, where subtle and stereo- 
type components are expected to be positively correlated, and yet 
actually have a negative correlation with MF in the normal range. 

Although the above conclusions can only be very tentatively 
stated on the basis of the present data, it seems clear that further 
research in this area should be directed toward isolating the various 
components of the MF variable. An adequate subtle scale is needed, 
and, since good subtle items are hard to find, it may be necessary to 
construct it by adding correlating, non-MF items or by using per- 
ceptual or other items that do not involve self-description. When a 
good subtle scale is available, it may very well be found that the 
difference between subtle and stereotype scores is the most interest- 
ing aspect of the MF variable. 


Summary 


Three experimental МЕ scales were constructed on the basis of 
sex difference in item response and the general expectation of a sex 
difference. The Subtle scale was composed of items. showing sex 
differences which people are generally not aware of; the Stereotype 
scale was composed of items showing no sex difference but for which 
there was a general expectation of a sex difference; and the Obvious 
scale was composed of items showing a sex difference of which people 
are generally aware. The Subtle scale was found to have low in- 
ternal consistency, probably because of the small number of good 
subtle items available. The Subtle and Stereotype scales correlated 
—.44 for female subjects and —.49 for male subjects. Possible ex- 
planations for this negative relationship, as well as other correla- 
tions among various MF scales, were discussed. 


AND PSYCHOLOGICAL MEASUREMENT, XII (1952), 427—439. 
Gough, H. G. Manual for the California Psychological Inventory. 
Palo Alto: Consultin Peychologists Press, 1957. 
Guilford, J. » К. B. “Personality Factors 8, E and 
e urement.” Journal of Psychology, II (1936), 


Guilford, J. P. and Martin, H. , JAMIN. 
Beverly Hills: Sheridan Supply cor IM Y of Factors ПАМІ 


— 


ee, 


ROBERT C. NICHOLS 461 


Hathaway, S. R. and McKinley, J. C. Manual for the Minnesota 
Multiphasic Personality Inventory. Minneapolis: University of 
Minnesota Press, 1942. 

Heston, J. C. *A Comparison of Four Maseulinity-Femininity 
Scales.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, VIII 
(1948), 375-387. і 

Heston, J. C. Heston Personal Adjustment Inventory. New York: 
World Book Company, 1949. 

Shepler, B. “A Comparison of Masculinity-Femininity Measures.” 
Journal of Consulting Psychology, XV (1951), 484-486. 

Strong, E. K. Vocational Interest Blank for Men. Stanford: Stan- 
ford University Press, 1938. 

Terman, L. M. and Miles, C. Sex and Personality: Studies in Mascu- 
linity and Femininity. New York: MeGraw-Hill, 1936. З 
Udell, B. “Subtle and Obvious Measures of Masculinity-Femininity." 

Unpublished M. S. thesis, Purdue University, 1959. 

Webster, H. *Personality Development During the College Years: 
Some Quantitative Results." Journal of Social Issues, XII 
(1956), 29-43. 

Webster, H. Research Manual: VC Attitude Inventory and VC 
Figure Preference Test. Mimeographed, Mary Conover Mellon 
Foundation, Vassar College, 1957. 


2 E 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


A GENERAL FACTOR OF SOCIAL DESIRABILITY 
IN THE HIGH SCHOOL: 


С. C. ANDERSON лхр R. E. TRAUB 
University of Alberta 


Messick (1960), while discussing a factor analysis of the social 
desirability judgments of 42 items representing Murray's 14 psy- 
chological needs made by a group of “manifestly disturbed mental 
hospital patients," states: 

The multidimensionality of social desirability ratings, the major find- 
ing of the present study, naturally must be crossvalidated on normal 
populations, The stability and generality of the partieular nine factors 
found here would be of interest, but it might be expected that several 
new and different points of view would be uncovered by varying the type 
of subjects sampled and items rated. (1960, p. 286) 

The present research arises from this and involves the application 
of an extension of Messick's questionnaire to adolescents in Grades 
X, XI, and XII in four rural high schools. Three hypotheses were 
tested. The first is that a general factor of social desirability can be 
extracted from the intercorrelations of social desirability judgments 
made by the total sample, and that this factor will be increasingly: 
general, having greater weight at Grade XII than at Grade X. 

This hypothesis is based on three sources: first, an observation 


1The authors are indebted to Steve Hunka for assisting with the factor- 
analysis and giving advice on it; to Dr. S. Messick of Educational Testing 
Service for permission to mimeograph his Trait Rating Form, and to Dr. А. 8. 
Edwards and The Psychological Corporation for permission to administer 
Messick's version of the Edwards Personal Preference Schedule. We should 
also like to state that, since there is evidence to indicate that item responses 
obtained to selected items isolated from the content of the Personal Prefer- 
ence Schedule are not comparable to those obtained within the context, the 
results of this research cannot be considered applicable to the standardized 
complete form of the Personal Preference Schedule. 


463 


464 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


made by a colleague, R. J. C. Harper (personal communication), 
derived from some unpublished work on the semantic differenti 
with female neuroties, that introverted neuroties (dysthymies 
make markedly more diseriminating judgments about social con- 
cepts than normals. From this suggestion, that normals make less 
fine discriminations than neuroties, it is possible to infer that the 
concept of social desirability will be markedly more general fo 
normals than for neuroties and that the present total sample o 
normal adolescents will yield intercorrelations providing evidence, 
of such a factor. A second source for the hypothesis was some 
yet unpublished research by the present senior author (C. C. Ander- 
son) which showed a progressive increase from Grades VII to XÎ 
in preference for conventional moralistie attitudes and tradition 
values by adolescents in Alberta. Such a trend, if representativ 
would sustain consistent evaluation in terms of social desirabili 
of many of the needs in Messick's list, and a corresponding increase 
in generality of the social desirability factor in Grades X through 
XII would be expected. A third source was the generally accepted 
fact that rural adolescents are more naively open than their urban 
counterparts to parental decisions and values concerning inter- 
personal matters (Wheelis, 1958, pp. 90-91). In this case, signs of а 
general factor at even the Grade X level might be expected. 

A second hypothesis, that girls would make a greater contribution 
to this factor than boys, came from Davis's assertion that ^. . . in 


ANDERSON AND TRAUB 465 


structure using the oblimax criterion. There was a characteristic 
trend, due to the relative dullness of early drop-outs, towards sig- 
nificantly higher intelligence in the Grade XII (mean LQ. = 110) 
as against the Grade X samples (mean I.Q. = 103). 


Results 


Evidence of a general factor of social desirability depends partly 
on the criterion adopted. Messick assumes that a general factor is 
not indicated if any of the unrotated centroid factors account for 
only about 20 per cent of the total variance. Corresponding figures 
from the present research are presented in Table 1 for the centroid 
factors in each of the six matrices (two sexes at three grade levels). 

The hypothesized general factor of social desirability underlying 
the responses of the total adolescent sample and the trend towards 
increasing generality from Grade X to Grade XII are clearly illus- 
trated from the loadings of the first factor. Extrapolation from the 
present data would enable us to expect a sizable general factor of 
social desirability to be extracted from the intercorrelations of judg- 
ments of social desirability at the level of adult normals. 

The psychological basis of this general factor can be suggested 
by listing for a variety of groups (total sample, Grade XII boys, 
Grade XII girls, Grade XI boys, and Grade X boys) the distribution 
of significant positive and negative centroid loadings on the general 
factor which are associated with Murray’s needs, 


TABLE 1 


Percentages Contributed to the Total Variance by Centroid 
Factors in Each of the Six Matrices 


Factors 
Grade Groups 1 2 3 4 5 6 7 
x boys 24.49 19.86 10.64 11.59 11.05 9.69 9.24 
x girls 23.72 17.41 14.14 11.03 10.86 10.16 7.6 
XI boys 28.34 16.48 14.55 10.96 8.59 7.94 7.0 
XI girls 25.23 16.06 14.05 12.38 11.06 8.84 9.5 
' XII boys 31.82 19.2 11.95 10.47 8.2 8.6 7.9 
XII girls 27.09 13.44 13.8 12.32 QU 9.0 8.5 
EMEN I CHUNG TS 0. 
Total Sample 34.62 16.52 7.45 7.65 5.71 5.14 5.06 


————À ا‎ AB Él SP 1. 


466 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 


Distributions of Significant Positive and Negative Centroid 
Loadings Associated with Murray’s Needs 


Exhibitionism 
Achievement 
Changeability 
Order 


Affiliation 
_Nurturance 
Intraception 
Succorance 
Abasement, 
Endurance 
Вне в 


Н ка кч دن‎ OU Сл 
- 

к слм сл 

t9 02 دن بم‎ QU i Re 

к^ C9 tO н» Q2 Q2 02 сл 


HN 
нн оюныны мы 
ныны 

to t2 
Ree m 
m m 
ne 


н nw 


Table 2 merits two comments: first, the first centroid factor at 
Grade X level deserves to be labeled an emergent general fadi 
because of its psychological similarity to its more developed col 
terpart in Grade XII, Secondly, adolescents are consistent in | 
Social desirability judgments of what might be termed autho 
tarian needs (Sanford, 1956)—Heterosexuality, Autonomy, Domi- 
nance, Aggression, and Exhibitionism—but this is not true of th 
relatively “tender-minded” needs—Affiliation, Succorance, Abase 
ment, and Nurturance, Presumably some psychological factor i 
directing the individual's judgments of behavioral and attitudini 
manifestations of the former but not of the latter needs. 

Table 1 provides evidence about the second hypothesis, the diffe 
ential contribution of sex to the first factor, which is consistentl 
stronger for boys than for girls at all grade levels. It is conclude 
that the factor of social desirability is more general for the mal 
than for the female adolescents in the present sample and that й 
second hypothesis is not, verified. 

Testing the third hypothesis necessitates a quantitative 42 
possibly a qualitative comparison of Messick’s factors with tho 


ANDERSON AND TRAUB 


TABLE 3 


Number of Variables with Significant Loadings on Messick's Factors 
and on Present Factors for Grade XII Boys 


Factors 
1 2 3 4 5 6 7 8 9 


Messick 5 6 a1 3 5 6 3 
Present Study 20 22 21 35 20 971 728 КЕЕ 


found in the present investigation. The complete quantitative com- 
parison was too bulky for extended citation, but a representative 

_ selection is given in the accompanying three tables. Table 3 presents 
evidence demonstrating the lack of overlap between the two sets of 
factors. 

The factor pattern for the Grade XII boys is markedly broad, 
in terms of the number of variables which are participating in the 
composition of the pattern, in comparison with Messick's nine fac- 
tors which contain relatively few variables with significant loadings. 
Some more detailed evidence on this point can be seen in Table 4 
where the number of variables with significant loadings on Mes- 
sick’s factors but which are merely present in the questionnaire 
used in this study is compared to the number of variables signifi- 
cantly loaded on the present factors which are found in Messick’s 
questionnaire. 


TABLE 4 
Number of Variables with Significant Loadings on Messick's Factors 
Appearing on the Present Questionnaire Compared to the Number 
of Variables Significantly Loaded on Present Factors 
Appearing in Messick’s Questionnaire 


(Grade XII Boys) 
*"- Messick’s Factors 
1 2 3 4 5 6 7 8 9 
5.1 4/2 5/12 8/12 3/1 4/12 4/12 3/12 4/12 4/12 
9 2 4/15 5/15 8/15 3/15 4/15 4/15 3/15 4/15 12/15 
& 9 4/13 5/3 8/13 3/I3 4/13 4/13 3/13 4/13 12/13 
ә 4 4/21 6/21 8/21 3/21 4/21 4/21 3/21 4/21 12/217 
9 5 4/16 5/16 8/16 3/16 4/16 4/16 3/16 4/16 12/16 
È 6 . 4/16. 5/16 8/16 3/16 4/16 4/16 3/16 4/16 12/16 
7 4/19 5/19 8/19 3/19 4/19 4/19 3/19 4/19 12/19 


468 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT { 


Clearly the size of the ratios indicates the lack of overlap b 
tween Messick’s and the present factors. The actual number 
variables with significant loadings common to the two anal 
presented in Table 5. 

There is little similarity and overlap between the two sets 
factors; the narrower characteristic of Messick’s factor pati 
would seem to indicate that it cannot be generalized safely to o 
groups. 

In view of these quantitative differences, a qualitative comparison 
of the present factors with Messick’s was regarded as superflu 
and was not undertaken. However, an attempt to describe the 
eral factor in psychological terms was prompted by the close rel 
tionship between the test instructions, a recent comment by Н 
(1960), and some research by De Soto (1959). Part of the 
instructions ran: “On the following pages you will find some 
ments about things that people say they like to do. These stateme 
describe certain tendencies, preferences, or ‘traits.’ You are to ma 
a judgment of how desirable you think each trait would be in othe 
people.” The quotation from Hebb is to the effect that ©“... 
mental processes of self-perception are the same processes, in large” 
part, that constitute the perception of another person” (1960, p. 
742). And De Soto, investigating Edward's definition of social 
desirability, found that it appeared to be what the testee consid 
most essential for his (the lestee's) "well-being" and not really 
what is socially approved. In other words, social desirability is 
really personal desirability. In this case the present general factor 


TABLE 5 


Number of Variables with Significant Loadings Appearing in 
Both Messick's and the Present Study 


(Grade XII Boys) 
Messick’s Factors 
MESE SUA 5.6 T S d 


Present Facotrs 
"OO س تھ دن‎ 
оюы нн мын 
он ده دۀ دن‎ ә سر‎ 
№ о о Ф Ф ы ы 
н ыы دن سردي‎ | 
me tom NI IN 
m1 ھن نر سردن سر‎ 
Фоны тн 
bO BO O9 e 
ج ت ده‎ чоо 


ANDERSON AND TRAUB 469 


of social desirability might be a product of a factor or syndrome 
isolated from factorial studies of personality, and the oblimax 
factors might then represent traits associated with that factor. 

To this end an attempt was made to describe in psychological 
terms the oblimax factors for Grade XII boys. The first factor 
appeared to resemble the Guilford-Zimmerman Sociability (with 
positive loading on “To like to say things that are regarded as 
witty and clever by other people,” “To like to participate in fads 
and fashion,” and a negative loading on “To like to analyze your 
own motives and feelings”). The second factor resembled Davis’s 
(1944; 1952, p. 20, pp. 31-32) Socialized Anxiety inverted (with 
positive loadings on “To like to read books and plays in which sex 
plays a major part” and “To like to listen to or tell jokes in which 
sex plays a major part,” and negative loadings on “To like to put 
in longer hours of work without being distracted” and “To stay up 
late in order to get a job done”). The third factor was interpreted 
as Dominance-Submission or Ascendance (with a positive loading 
on “To like to argue for your point of view which is attacked by 
others,” and a negative loading on “To like to have friends who 
sympathize with you and try to cheer you up when you are de- 
pressed”). Factor 4 was interpreted as Control (with positive load- 
ings on “To like to be able to come and go as you want” and “To like 
to have your meals organized and a definite time set aside for 
eating”). Factors 6 and 5 were interpreted with diminishing confi- 
dence as Impulsivity and Liking for Change, and the seventh factor 
was unidentifiable. These factors resemble some of the components 
of extroversion-introversion recently collated by Carrigan (1960, pp. 
336-338), and from this it might be inferred that the psychological 
basis of the general factor of social desirability is extroversion. 

These predictions are logically implied in such an analysis and 
must be verified if the analysis is correct: first, males must be 
significantly more extroverted than females because they consist- 
ently contribute more to the total variance, a point substantiated by ` 
data cited by Carrigan (1960, p. 334). The second prediction is that 
a significantly large proportion of each of the present samples con- 
sists of extroverts, a point on which Saunders (1961) has provided 
evidence by reporting that the ratio of extroverts to introverts, as 
measured by responses to the Myers-Briggs test, is about 3 to 1. 
And, finally, a third prediction is that a second-order factor analysis 


470 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of the intercorrelations among the primary factors for Grade XII 
boys will yield two factors identical with Carrigan’s two main 
components of extroversion, Social Extroversion, and Impulsivity 
versus Self-Control. The oblimax factors were intercorrelated and 
the unrotated loadings, derived by Hotelling’s Method of Principal 
Components, were rotated graphically to simple structure. The 
rotated loadings appear in Table 6, Factor 2 resembles Carrigan’s 
Impulsivity versus Self-Control, but the interpretation of Factor 1 
is less certain. 


TABLE 6 


Rotated Factor Loadings Derived from Second-Order Analysis of 
rimary Factor Intercorrelations 


Factor 1 Factor 2 
ST Se ee о” Ro 

8 1 —.576 Sociability .109 

3 2 .653 Socialized Anxiety Inverted .892 

ENS .762 Dominance-Submission .014 

b 4 —.782 Control 477 

5 — .186 Liking for Change ‚590 

£ 6 706 Impulsivity -.644 

1 314 (Unknown) —.773 


: This interpretation of the general factor in terms of extroversion 
1s reinforced by the recent findings of Couch and Keniston (1960) to 


the effect that “yeasayers” are characterized by extroverted im- 
pulsivity. 


Summary 


A matrix of intercorrelations, based on social desirability judg- 
eng made by rural adolescents to 54 items associated with Mur- 
ray в needs, was drawn up for six groups (Grade X boys, Grade X 


rotated to oblique simple structure using the oblimax criterion. The 
first centroid factor became increasingly general with successive 
grade levels, being particularly prominent in the analyses made of 
the responses of Grade XII boys and of the total sample. An 
analysis of the oblimax factors for Grade XII boys yielded six 
factors tentatively identified as Sociability, Socialized Anxiety in- 
verted, Ascendance, Control, Impulsivity, and Liking for Change 


ANDERSON AND TRAUB 471 


and a second-order analysis yielded two, one of which appeared to 
be similar to those listed by Carrigan as the primary components 
of extroversion, Social Extroversion and Impulsivity. The general 
factor was, therefore, tentatively identified psychologically as 
extroversion, At every grade level boys contributed more to it than 
girls. There was little similarity between the factors extracted in 
the present research and those of Messick (1960). 


REFERENCES 


Carrigan, P. M. “Extroversion-Introversion as a Dimension of 
Personality.” Psychological Bulletin, LVII (1960), 329-360. 
Couch, A. and Keniston, К. “Yeasayers and Naysayers: Agreeing 
Response Set as a Personality Variable.” Journal of Abnormal 
and Social Psychology, LX (1950), 151-174. 

Davis, A. “Socialisation and Adult Personality." Forty-Third 
dest of the National Society for the Study of Education, 
1944. r 


Davis, A. Social-Class Influence Upon Learning. Cambridge: 
Harvard University Press, 1952. 

Davis, A. "The Ego and Status-Anxiety.” In White, L. D. (Editor), 
The State of the Social Sciences. Chicago: University of London 
Press, 1956. 

De Soto, C. B., Kuethe, J. L. and Bosley, J. J. *A Redefinition of 
Social Desirability." Journal of Abnormal and Social Psychol- 
ogy, LVIII (1959), 273-275. 

Eysenck, H. J. Manual of the Maudsley Personality Inventory. 
London: University of London Press, 1959. 

Hebb, D. O. “The American Revolution." American Psychologist, 
XV (735-745), 1960. 

Messick, S. "Dimensions of Social Desirability." Journal of Con- 
sulting Psychology, XXIV (279-287), 1960. 

Sanford, N. "The Approach of the Authoritarian Personality." In 
MeCary, J. L. (Editor), Psychology of Personality. New York: 
Grove Press, 1956. 

Saunders, D. R. (1961) Personal Communication. 

Wheelis, A. Quest for Identity. New York: Norton, 1958. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


A FACTOR ANALYSIS OF INTERESTS IN CERTAIN 
SKILLED OCCUPATIONS 


LEONARD V. GORDON! Axo ADOLPH V. ANDERSON 
U. S. Naval Personnel Research Field Activity, San Diego? 


Srrone (1943, p. 315), in diseussing the application of factor 
analysis to occupational interest, asks whether the factors resulting 
from these analyses are “really functional unities ог... merely 
mathematical coordinates in terms of which occupations may be 
located in space?” He goes on, “If the latter, they have served their 
purpose in identifying occupational groups and need be considered 
no further. If the former, then they should be identified and interest 
tests developed to measure them directly.” Reviewing the results of 
factor analyses performed on the scales of the Strong Vocational 
Interest Blank, Strong expresses a skepticism over the possibility of 
finding a few interest factors which will explain all interests, but he 
acknowledges that this may be due to the way in which his scales 
have been constructed. 

Guilford and his associates (1954, p. 29) are more optimistic re- 
garding the possibility of discovering primary interest factors. 
Discussing the results of their major factor analytic study, they 
state that their findings “support well the belief in vocational inter- 
est factors as genuine psychological unities.” Darley and Hagenah 
(1955, p. 176), however, question the utility of Guilford’s findings. 
They state, “Even if we conclude that these studies provide a better 
definition of the personal variables that ultimately become trans- 
lated into occupational choice, or measured occupational adjust- 


1 Now at U. 8. Army Personnel Research Office, Washington, D. C. 
? The opinions expressed herein are those of the authors, and do not neces- 
sarily reflect those of the Department of the Navy. 


473 


474 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ment, we are helpless to use such a conclusion in counseling until 
adequate tests of the variables are constructed and validated in the 
kinds of longitudinal studies that Strong and others have carried 
out." 

The present study represents a search for factors underlying 
interest in certain technical, clerical, and trade occupations. Identi- 
fieation of such factors would give added support to Guilford's 
position, contribute to an understanding of the dimensions under- 
lying occupational interest, and provide dimensions upon which 
test development and longitudinal validation could be performed. 


Procedure 


The primary objective of the present factor analysis was to 
identify some minimum number of independent dimensions which 
would account for interest in 58 training programs which the Navy 
has to offer the new recruit. The type of training offered in these 
programs is, for the most part, very similar to that given in cor- 
responding civilian technical or trade schools, and encompasses 4 
wide variety of skilled occupational specialties. To accomplish the 
aforementioned objective, it was considered necessary to obtain a 
set of activity items which would (a) represent the principal tasks 
performed in practically all of the Navy specialties and (b) include 
content which would be reasonably familiar to the new recruit. It 
was also necessary that the items be administrable in no more than 
one hour. Certain preliminary steps were taken to prepare a set of 
items which would meet these specifications. 

The United States Navy Oceupational Handbook for Men (1954) 
was used as the primary source of items, This Handbook, available 
at high schools, recruiting depots, and recruit training centers, 
provides detailed information regarding occupational training 0p- 
portunities available in the Navy. A secondary source, the Navy 
Job Classification Manual (1954), provided additional tasks not 
listed in the Occupational Handbook. From these two sources, 440 
different task description items were obtained for the 58 Navy 
specialties, 

Since it was believed that 300 items would be the maximum 
number that could be conveniently administered within the 026 
hour limit, it was necessary to reduce the size of the item pool while 
retaining representative tasks from all specialties. To accompli 


— 


GORDON AND ANDERSON 475 


this item reduction, the 440 items were sorted into 22 hypothesized 
interest areas. The items were then assembled into two forms of 210 
and 230 items, with each form having about the same relative 
representation of items from the 22 hypothesized interest areas. 
Each form was administered to a sample of about 400 recruits dur- 
ing their third day in the Navy. The recruits specified their degree 
of liking or disliking of each task, using a five-point scale. For each 
form, correlations were obtained between each item and each 
hypothesized interest area. In addition, preference values were 
obtained for each item. Items were then identified which were simi- 
lar to one another in item-interest area correlations, preference 
values, and content. Item reduction consisted of discarding items 
in task areas where there appeared to be a number of items with 
very similar characteristics. The 140 items which were eliminated 
in this manner were fairly equally divided among electrical, elec- 
tronic, clerical, and mechanical tasks. A substantial number of items 
from each of these categories was retained. 

Some of the items appeared to be couched in terms which would 
be unfamiliar to the new recruit. It was believed that, if such items 
were rewritten in more familiar terms, more valid responses would 
result. Therefore, each of the two original forms was administered 
io a sample of about 400 new recruits, the recruits being asked to 
specify, on a five-point scale, the degree to which they understood 
what was involved in each task. Items which the recruit found to 
be highly unfamiliar, and which had not been eliminated in the 
aforementioned item reduction process, were rewritten in more 
familiar terms (by way of example, “analyze vacuum tubes with a 

' vacuum tube analyzer” was changed to “test radio tubes with a tube 
tester"). 

The final set of 300 items, on which the factor analysis was based, 
represented tasks performed in 58 Navy specialties. Some tasks were 
performed in more than one specialty. In all, the 300 tasks occurred 
a total of 863 times in all specialties. The number of different tasks 
per specialty ranged from 4 to 30, the median number being 10.5. 

For factor analysis purposes, the 300 items were administered to 
a sample of 400 recruits at the Naval Training Center, San Diego. 


3 A statistical summarization of the new recruits' preference for and famili- 
arity with tasks performed in the various training programs is presented in 
Gordon and Steinemann (1961). 


476 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


At the time of testing, the recruits had been in the Navy less the 
three days and, for all practical purposes, had had no indoctrinat 
regarding Navy occupations. About 75 per cent of the sample we 
17 or 18 years of age and about half were high school graduate 
The recruits were asked to indicate their degree of preference fo 
each of the 300 tasks using the following alternatives: 


1. I would very much like to do this 

2. I would like to do this 

3. I would neither like nor dislike to do this 
4. I would dislike to do this 

5. I would dislike very much to do this 


Because of the very large number of items involved, the Wherry: 
Gaylord method of factor analysis was used (Wherry & Gaylor 
1943). This method does not require intercorrelations among th 
items, but yields results equivalent to those obtained by using the 
more traditional centroid method (Wherry, Perloff & Campbell 
1951). 

To facilitate computations of the factor loadings, distributions 0 
responses to the five alternatives were obtained for the answé 
sheets of the 400 recruits, and responses for each item wel 
dichotomized at the point which would give closest to a 50 per cel 
split. Responses on each answer sheet were now coded 0 or 1, de 
pending on whether the individual's response was below or aboy 
the point of cut. 

As a starting point for the factor analysis, the 22 hypothesizet 
interest areas, developed from the earlier item pool, were used 48| 
first set of “a priori” factors. Items believed to represent each 0 
these factors were identified, and scoring keys were prepared usin 
these items. The papers were then scored for all 22 “a priori" fae 
tors, and intercorrelations among the factor scores were then 00 
tained. Examination of the pattern of intercorrelations indicate 
that the number of “a priori" factors could safely be reduced to d 
by simple amalgamation of several of the keys. Preliminary fact 
loadings, in the form of tetrachoric coefficients of correlation Ё 
tween the item and each “a priori” factor score, were obtained wi 
the aid of the Mosier-McQuitty abac. Factor purification proceed 
through four iterations, with a consequent reduction of the numb 
of factors from 17 to 8. Intercorrelations were then obtained am 


GORDON AND ANDERSON 477 


the eight factor scores. A transformation matrix, based on the inter- 
factor correlations, was used to obtain orthogonal factor loadings 
for the items. The eight factors were then rotated for meaning- 
fulness. 


Results 


Descriptions of the eight factors are presented below.* The dozen 
items having the highest loadings on each factor are presented in 
Table 1. 


TABLE 1 


Orthogonal Loadings of Items With the Highest 
Loadings on the Eight Interest Factors 


Factor I Clerical 
Loading Description 
.84 type messages on a teletype machine 
.81 type a payroll 
.79 fill out forms for purchase, receipt, issue, and sales of supplies 
and equipment 
бү; сору letters, forms, and reports on a typewriter 
Sx put letters and papers in order in a filing cabinet 
-75 write newspaper or magazine articles about ships and naval 
personnel 
-73 check letters and reports for accuracy 
.68 read printer's copy for errors 
.68 use à typewriter to copy messages coming over the radio 
.66 add columns of figures with an adding machine to check payroll 
records 
.66 take shorthand and do typlng in an office 
.65 keep a record of registered mail and packages received in an 
office 
Factor II Electrical-Electronic 
.90 test electrical equipment for shorts, grounds, and other break- 
downs 
.89 check radio tubes with a tube tester 
.86 find the trouble and make repairs to electric power and light 
circuits 
.86 use à wiring diagram to check the operation of electronic equip- 
ment 
.85 learn how a radio tube works 
.83 use а voltmeter, ammeter, soldering iron, and other equipment 


to repair electronic devices such as radio, radar, and sonar 


*To conserve space, the Table containing all items and the complete factor. 
matrix has been deposited with the American Documentation Institute. Order 
Document No. 7161, remitting $1.75 for 35-mm. microfilm or $2.50 for photo- 
copies from Chief, Photoduplication Service, Library of Congress, Washing- 
ton 25, D. C. 


478 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


.79 


76 


FETELE] 


- 
y 


TABLE 1 (Continued) 


Orthogonal Loadings of Items With the Highest 
Loadings on the Eight Interest Factors 


Factor II Electrical-Electronic 
Description 
use such testing equipment as voltmeters, ammeters, and oli 
meters in checking and testing electrical circuits and едш 
ment 
use a screwdriver to help install electronic aiming equipme 
in a guided missile Л 
regulate and make minor adjustments to radio transmitters am 
receivers 
repair, adjust, and install electronic equipment, such as radi 
or radar, in airplanes 
locate and replace defective parts in electronic equipment 
check radio, radar, and other electronic equipment in airplam 
before they take off 


Factor III Mechanical 


cut, bend, solder, and rivet sheet metal to make ventilatio 
ducts, lockers, and tanks 

shape metal parts according to blueprints by using Уй 
machine tools such as milling machines and lathes : 

check and lubricate refrigeration and air conditioning machin 

learn how gasoline or diesel engines work 

replace a valve in a ship’s fuel line 

cut various types of screwthreads on a lathe . 

use all power-driven tools commonly used in engine repair 9 
overhaul, including lathes, power drills, and valve faci 
machines 

straighten frames and chassis of trucks and construction equi 
ment with hydraulic jacks and chains 

use blacksmithing tools to make shackles, chain hooks, 
brackets 4 

use hand tools to take apart and repair hydraulic gun 
machinery 

use equipment for identifying different metals and alloys 
foundry 

use a torch to solder joints in copper tubing 


Factor IV Medical-Dental 


assist а doctor in an operating room 

sterilize and arrange instruments for a dentist 

assist a dentist in extracting teeth 

sterilize and arrange surgical instrumenta for a doctor 

mix and prepare various types of medicines е 

plan special diets for hospital patients according to 1 
from a doctor 

make dental plates, crowns, and bridges from wax mosg 
teeth and gums 

teach proper methods of caring for teeth to individual 
patients 


GORDON AND ANDERSON 479 


TABLE 1 (Continued) 


Orthogonal Loadings of Items With the Highest 
Loadings on the Eight Interest Factors 


Loading 


42 
.68 


.63 
.61 


Factor IV Medical-Dental 
Description 


use chemicals and a microscope to analyze samples of blood 
and urine 

take and record the pulse rate, temperatures, and blood pressure 
of patients 

operate an X-ray machine 

interview patients and make a record of their illnesses and medi- 
cal histories for a doctor 


Factor V Navigation. 


learn to use and make weather maps 

use weather instruments and messages to make weather maps 

use drawing instruments to make maps 

draw in accurate corrections to charts and maps 

taking all weather conditions into account, prepare diagrams 
showing best flight levels for aircraft making long flights 

use a T-square, dividers, and other draftsman’s tools to make 
accurate drawings of machine parts 

use weather messages and charts to forecast the weather for pilots 

check and inspect weather instruments used to measure wind 
speed, temperature, and humidity 

read and record directions from a magnetic or gyrocompass 

use a book of tables showing ocean currents and tides to assist 
the navigator of a ship 

operate the remote control electronie equipment used to control 
guided missiles 

use small hand tools to adjust binocular lenses 


Factor VI Aviation 


clean airplane windshields 

install and check gunsights and bombsights 

check, store, and issue aircraft equipment and parts 

help place airplanes into position for launching from a catapult 

operate wing flaps and landing gear controls as an airplane flight 
engineer 

repair airplane parts such as wings, elevators, ailerons, tabs, 
and rudders 

change an airplane tire 

operate the radio set in an airplane and report messages to the 
pilot 

replace the cables which control the rudder and elevators in an 
airplane 

check carburetors and fuel systems of aircraft 

remove, clean, and replace airplane engine carburetors 

inspect airplanes before they take off by checking the controls, 
tires, gasoline, etc. 


480 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 (Continued) 


Orthogonal Loadings of Items With the Highest 
Loadings on the Eight Interest Factors 


Factor VII Hazardous Duty 
Loading Description 
.85 be a member of submarine crew 
‚84 use a diving suit and equipment to place underwater ехр1овї 
for destroying enemy ships, docks, and bridges 
77 try out for submarine service by staying in an air pressure 


under high pressure for 15 minutes 


‚75 install parts in mines, such as clock starters and firing pins 
71 use a diving suit and equipment to inspect underwater parts 
ships, docks, and bridges 


7 live under crowded conditions as a member of а submarine crew 
‚69 take apart rockets, bombs, and shells that have failed to explode 


.68 check fuel, air, and water tanks in torpedoes 

‚67 handle, load, and plant mines 

.66 learn dra operate all machinery and equipment aboard a 
marine 

.63 set and fire dynamite and other explosives in preparation for 


new construction 
handle and store high explosives, mines, and depth charges 


Faclor VIII Service 


use sewing machines and other equipment to repair parachutes 
load clothing into washing and drying machines in a laundry 
replace worn heels on shoes - 
в charge of а small number of men on а crew cleaning 


28 8838 


clean rooms and make beds in officers’ living quarters 

give haircuts 

inspect food storage lockers and refrigerators for cleanliness 

prepare soups, vegetables, meats, salads, and desserts 

be a member of a work crew cleaning a building 

use a sewing machine to shorten or lengthen sleeves or tro 
on Navy uniforms 

cut and sew cloth to repair signal fl: 

wash pots and pans x vee 


£8 BES 


Factor I has been labeled Clerical, Most of the items with high 
loadings describe the type of work performed by clerical or steno- 
graphic personnel. Other office-type duties appear on this factot 
with relatively high loadings, 

Factor II, defined as Electrical-Electronic, represents an inte 
in performing electrical and electronic repair and maintenance. 
Despite attempts to keep separate electrical and electronic key? 
during the iterative procedure, these two keys clearly merged 
form a single factor, 


GORDON AND ANDERSON Li 


Factor ПІ, labeled Mechanical, representa Mechanical and Con- 
struction tasks, Metalwork and engine repair dominate among items 
with the highest loading. Carpentry and other construction items 
also appear on this factor with fairly high loadings. An attempt was 
made to maintain separate mechanical and construction factors 
during the iteration. However, a single factor, subsuming these two 
types of tasks, was clearly indicated. 

Factor IV, defined as Medical-Dental, representa an interest in 
medical and dental tasks. A number of tasks related to working 
with chemicals also have moderate loadings on this factor. 

Factor V has been named Navigation. Items related to working 
with maps and drawing have very high loadings on this factor. In 
addition, items reflecting interest in weather, instruments used in 
navigation, and directional guidance systems have high loadings on 
this factor. 

Factor VI reflects an interest in Aviation. The tasks with high 
loadings are somewhat diverse, and include activities of a mechani- 
eal, electronic, and service nature. This factor appears to represent 
an interest in aviation, per se. 

Factor VII represents an interest in engaging in activities having 
an element of danger, and has been labeled Hazardous Duty. Items 
related to work in submarines, diving, and ordnance predominate. 

Factor VIII has been named Service. Domestic tasks, typically 
done by women, such as sewing, laundering, cleaning, and kitehen 
work have the highest loadings on this factor. 

Discussion 

The results of the present study rather clearly support Guilford’s 
conclusion that psychologically meaningful occupational interest 
factors can be found. Two of the eight factors identified in the 
present analysis, Mechanical and Clerical, have been established 
in other analyses, while a third factor, Electrical-Electronic, resem- 
bles in many respects the Scientific factor which has been found 
for professional level occupations (Guilford, 1959). Tasks with the 
highest loading on this last factor deal with electronic maintenance 
or trouble shooting, and involve logical and mathematical thinking 
and а need for precision. Thus, the Electrical-Electronic factor may 
represent Scientific interest at a lower occupational level. 

Hazardous Duty is very similar to the Adventure ys. Security 


AX 


x 


482 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


factor isolated by Guilford and his associates (1954) for both officer 
and enlisted Air Force samples. The items with high loadings on the 


Hazardous Duty factor correspond closely to items in the ^Adven- - 
ture-exploration” and “Adventure-risk taking personal” subtests, - 


which define Guilford’s Adventure vs. Security factor. 

Guilford (1959) has inferred the existence of an Aviation interest 
factor from analyses of information Test and Biographical Infor- 
mation Blank data. The present analysis has isolated an Aviation 
interest factor through the use of actual interest items. 

The Service factor resembles the Domestic Interest factor found 
by Gernes (Guilford, 1959), which was believed to be unique to 
women. The present analysis verifies the identity of Gernes’ Domes- 
tic Interest factor and demonstrates the existence of this factor in a 
male population as well. 

The Medical-Dental factor includes tasks of a lower skill level 
than those performed in professional counterpart occupations, While 
this factor has not been reported in factor analytic studies, it is of 
interest to note that one of the homogeneous keys developed by Gee 
and Clark (1956), for the Minnesota Vocational Interest Inventory, 
represented an interest in medical and hospital service activities. 

The Navigation factor reflects an interest primarily in activities 
revolving around the making and/or use of maps and charts for 
various purposes. To the writers’ knowledge no counterpart factor 
has been found in other analyses, 

The results of the present analysis have been applied in the 
development of an interest inventory. The broad-scale, longitudinal 


validation of this instrument is described elsewhere (Gordon & 
Alf, 1962). 


REFERENCES 


Bureau of Naval Personnel, Department of the Navy. U. S. Navi 

ee Manual, (NAVPERS 16105). Washington 
b 1 

Bureau of Naval Personnel, Departme {ауу. U. S. Navy 
Occupational Handbook. Washington $5, 1968 

Darley, J. G. and Hagenah, Theda. Vocational Interest Measure 
ment. Minneapolis: University of Minnesota Press, 1955. 

Gee, Helen H. and Clark, K. E. “A Comparison of Empirical and 
Homogeneous Keys in Interest, Measurement.” Technical 
port, University of Minnesota, Nonr-710(17), 1956. 

Gordon, L. V. and Alf, E. F. “The Predictive Validity of Measured 


GORDON AND ANDERSON 483 


Interest for Navy Vocational Training." Journal of Applied 
Psychology, XLVI (1962), 212-219. 

Gordon, L. V. and Steinemann, J. H. "Occupational Information 
and Pre-Service Counselling.” Personnel and Guidance Journal, 
XXXTX (1961), 502-506. 

Guilford, J. P. Personality. New York: McGraw-Hill, 1959. 

Guilford, J. P., Christensen, P. R., Bond, N. A., Jr., and Sutton, 
Marcella A. “A Factor Analysis of Human Interests.” Psycho- 
logical Monographs, LXVIII (1954), No. 4. 

Strong, E. K. Vocational Interests of Men and Women. Stanford: 
Stanford University Press, 1943. 

Wherry, R. J. and Gaylord, R. H. “The Concept of Test and Item 
Reliability in Relation to Factor Pattern.” Psychometrika, VIII 
(1943), 247-269. 

Wherry, R. J., Perloff, R., and Campbell, Ј. T. “Ап Empirical 

Verifieation of the Wherry-Gaylord Iterative Factor Analysis 
Procedure." Psychometrika, XVI (1951), 67-74. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


A COMPARISON OF THE FACTOR STRUCTURE OF THE 
KUDER OCCUPATIONAL, FORM D 
FOR MALES AND FEMALES 


RICHARD E. SCHUTZ AN» ROBERT L. BAKER 
Arizona State University 


A previous factor analysis (Schutz Baker, 1962) of the Kuder 
Preference Record—Occupational, Form D based on the responses 
of 450 college freshmen males yielded seven readily interpretable 
rotated factors: I. Interpersonal-Directive; II. Engineering-Phys- 
ical Science; III. Business-Detail; IV. Business-Aesthetic; V. 
Verbal-Directive; VI. Outdoor; VII. Health Scientist. Since the 
Occupational can be appropriately used with females as well as 
with males, information concerning the factor structure for females | 
is of equal concern. 

The purpose of the present study was to determine the under- 
lying factor structure of occupational interests of women as meas- 
ured by the Occupational, Form D and to note differences, if any, 
between the factor structure of men and women. In addition to being 
of some theoretical interest, such a study should help the vocational 
counselor decide if differential treatment of male and female re- 
sponses to the Kuder Occupational is necessary. 


Method 


The Occupational was administered to all freshmen students en- 
tering Arizona State University in fall, 1959. A representative sam- 
ple of 488 females having Verification scores of 49 and above were 
used in the present study. The answer sheets were machine scored 
and the 42 raw scores for each examinee were punched into IBM 
cards. 

A product-moment intercorrelation matrix was prepared and a 


485 


486 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
principal components analysis was performed. Components with. 
eigenvalues greater than unity were rotated to simple structure 
using normalized Varimax procedures. All statistical computations 
were performed using an IBM 709 computer.! 


Results 


The analysis yielded eight rotated factors. Tables 1-8 present the 
scales with loadings above .30 on each of the factors. 

Factor 1 has been labeled Verbal-Directive. The contributing 
scales and their loadings suggest that in addition to the verbal or 
linguistic element, this dimension also involves a common element 
of authority or ascendency. This factor is directly parallel to the 
male Factor 5. With the exception of X-ray Technician and Pedia- 
trician, the contributing scales on the male and female factors are 
identical. 

Factor 2 has been designated as Engincering-Physical Scientist. 
The factor is parallel to the male Factor 2. The County Agricul- 
tural Agent, Architect, and Industrial Psychologist scales contribute 
moderate variance to the female factor, while they do not appear 
on the male Factor 2. Other than these differences, the scales con- 
tributing variance to the two factors are common. 


TABLE 1 
Factor 1: Verbal-Directive 
Scale No. Loading Occupation 
NT Sy а 00. __ а 
20 .91 Journalist (1) 
6 -87 Newspaper Editor (2) 
42 75 Librarian (Male) (3) 
22 .72 Lawyer (4) 
16 .45 Psychologist— Professor 
8 45 Psychologist— Clinical 
29 Psychiatrist 
A 42 Psychologist —Counseling 
42 Radio Station Manager 
42 Minister 
9 35 Psychologist —Industrial 


ө nd iin : the courtesy of Western Data Processing Сет 

dn ge ане! и Берат isl Сатен of Оша f 
| 5 id Ј. i i iting 

analysis at WDPC. y for his assistance in expedi 


SCHUTZ AND BAKER 


487 
TABLE 2 
Factor 2: Engineeriny-Physical Science 
Scale No. Loading Occupation 

17 .85 Civil Engineer (2) 
18 .83 Mechanical Engineer (3) 

2 .82 Electrical Engineer (4) 
36 .82 Mining and Metallurgical Engineer (1) 
27 .70 Industrial Engineer 
12 .64 Accountant 
13 .64 Meteorologist 
34 .63 High School Mathematics Teacher 
35 .60 Chemist 
21 .42 Architect, 
33 .40 High School Science Teacher 

9 .32 Psychologist—Industrial 


Factor 3, like the male Factor 7, contains a rather homogeneous 
pattern of scales. The factor has been labeled Health Scientist. 
Thirteen scales have loadings greater than .30 on the factor. Each 
of the 13 scales also appears on the male Factor 7. However, the 
male factor also contains five additional scales: Psychologist— 
Clinical, Mining and Metallurgical Engineer, Minister, and In- 
surance Agent. Thus the female factor appears somewhat more 
homogeneous than the male factor. 

Factors 4 and 5 are parallel to the two Business factors isolated 
for males. Factor 4 is again labeled Business-Detail. It includes 


TABLE 3 
Factor 3: Health Scientist 


Scale No. Loading Occupation 
7 ‚84 Physician (2) 
25 .82 Dentist (1) 
28 .76 Pediatrician (3) 
26 ‚75 Veterinarian (7) 
37 .15 Druggist 
39 .75 X-Ray Technician 
33 .66 High School Science Teacher 
35 .54 Chemist 
29 49 Psychiatrist 
16 .46 Psychologist — Professor 
13 .46 Meteorologist 
34 E High School Mathematics Teacher 
19 .30 Psychologist—Counseling 


488 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 
Factor 4: Business-Detail 
A 
Scale No. Loading Occupation 
23 80 Retail Clothier (2) 
24 79 Insurance Agent (4) 
40 65 Bank Cashier (1) 
30 57 Radio Station Manager (6) 
15 44 Department Store Salesman 
41 42 Pharmaceutical Salesman 
12 40 Accountant 
14 38 Personnel Manager 
37 38 Druggist 
1 30 County Agricultural Agent 


each of the scales found on the male Factor 3, and has four other 
moderately loaded scales, Factor 5 is parallel to the factor labeled 
Business-Aesthetie in the male analysis. 

Factor 6 has only two variables with loadings above .30 and is 
not easily interpreted, It is interesting to note that the Job Printer | 
scale, which contributes the major portion of the variance to this 


TABLE 5 


Factor 5: Business- Aesthetic ^4 
Scale No. Loading Oceupation 
31 Т 


Interior Decorator (1) 

3 —.70 Farmer (3) 
21 .67 Architect (2) 
34 —.46 High School Mathematics Teacher (6) 
15 45 Department Store Salesman 
11 —.88 School Superintendent 
40 —.88 Bank Cashier 
30 ‚35 Radio Station Manager 

8 .35 Psychologist Clinical 
29 31 Psychiatrist 

4 TABLE 6 


Factor 6: Job Printer 
Scale No. Loading Occupation 
38 .80 Job Printer 


39 -36 X-Ray Technician 


P 


SCHUTZ AND BAKER 489 


TABLE 7 
Factor 7: Outdoor 
Scale No. Loading Occupation 
4 .75 Forester (2) 
1 .63 County Agricultural Agent (1) 
26 .37 Veterinarian (4) 


I XR eS o _ _ 


factor, does not appear on any other factor. A similar doublet fac- 
tor, heavily loaded on the Job Printer scale, appears in the male 
analysis. 

Factor 7 is clearly an “Outdoor” factor. The male Factor 6 in- 
cludes all three of the scales appearing here plus Farmer, .87, and 
Accountant, .33. 

Factor 8 contains 19 scales with loadings above .30. Being a direct 
parallel of the male Factor 1, it has been labeled Interpersonal- 
Directive. Each of the occupations involves an interpersonal rela- 
tionship of some sort. Moreover, the relationship is one in which 
an authority figure attempts to manipulate, direct, or otherwise 
control the behavior of other individuals. Seventeen of the 19 con- 
tributing scales appear on male Factor 1. 


TABLE 8 
Factor 8: Interpersonal-Directive 


Scale No. Loading Occupation 
10 .86 YMCA Secretary (3) 
32 81 High School Counselor (2) 
14 ‚75 Personnel Manager (1) 
9 .07 Psychologist—Industrial (4) 
5 .66 Minister 
19 .63 Psychologist— Counseling 
11 .61 School Superintendent 
41 .60 Pharmaceutical Salesman 
8 .58 Psychologist Clinical 
29 .51 Psychiatrist 
30 .43 Radio Station Manager 
22 .42 Lawyer 3 
16 .40 Psychologist—Professor 
33 .37 High School Science Teacher 
15 37 Department Store Salesman 
27 .36 Industrial Engineer 
42 '..88 Librarian (Male) 
1 .82 County Agricultural Agent 


24 .31 Insurance Agent 


490 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
TABLE 9 
Scale Means and Standard Deviations for Males and Females 


————————— M0 0700 


Males Females 

Scale Mean S.D. Mean S.D. 
ee 
1 County Agricultural Agent 383.14. 6,41 30.49 4.65 
2 Electrical Engineer 33.55 6.02 27.73 5.75 
3 Farmer 52.27 8.49 47.64 77 
4 Forester 41.77 6.48 34.62 6.05 
5 Minister 82.57 7.16 43.79 6.83 
6 Newspaper Editor 27.42 7.01 33.01 7.11 
7 Physician 27.91 6.15 31.29 . . 5.78 
8 Psychologist— Clinical 35.06 8.47 43.01 8.13 
9 Psychologist—Industrial 34.64 8.00 37.86 7.27 
10 YMCA Secretary 40.75 — 8.77 51.39 8.36 
11 School Superintendent 71.71 7.88 43.05 6.44 
12 Accountant 40.16 6.38 40.42 7.09 
13 Meteorologist 40.86 6.58 37.79 6.48 
14 Personnel Manager 41.06 8.30 47.21 7.66 
15 Department Store Salesman 41.02 8.83 52.70 7.87 
16 Psychologist—Professor 46.19 8.70 49.56 8.90 
17 Civil Engineer 47.00 5.57 40.83 6.28 
18 Mechanical Engineer 43.12 0.62 37.90 6.45 
19 Psychologist—Counseling 54.011 — 9.97 61.99 10.02 
20 Journalist 41.22 7.90 45.65 8.20 
a Meet 51.78 8.50 57.56 Her 
awyer 45.63 9.13 51.00 8.5 

23 Retail Clothier 35.08 8.15 42.01 7.57 
24 Insurance Agent 30.02 6,21 31.24 6.18 
25 Dentist | 37.90 6.38 42.82 6.39 
26 Veterinarian — 46.08 6.16 43.46 6.76 
27 Industrial Engineer 38.66 7.18 31.83 6.82 
n Pediatrician 38.80 8.18 46.64 7.54 
9 Psychiatrist 38.80 8.40 46.32 7.62 
30 Radio Station Manager 37.74 7.55 40.88 6.89 
31 Interior Decorator 42.45 8.44 39.90 8.19 
32 High School Counselor 36.43 765 44.44 — 7.09 
33 High School Science Teacher 36.34 7.92 38.40 6.98 
34 High School Mathematics Teacher — 34/52 6.31 35.15 — 5.87 
» геи К 43.70 8.98 42.69 7.95 
7 Drop end Metallurgical Engineer 361 638 30.72 6.23 
38 Jop ees 37.88 623 40.34 5.97 
ob Printer — 30.98 — 5.97 36.26 5.65 
39 X-Ray Technician 42.49 6.64 45.34 6.36 
40 Bank Cashier 45.40 646 45.13 6.68 
41 Pharmaceutical Salesman 50.84 1079 56.84 9.32 
42 Librarian (Male) 41.56 8.71 8.27 


= 


Table 9 presents means and standard deviations for the male and 
female sample on each of the scales. Due to the large N’s involved, 


SCHUTZ AND BAKER 491 


each of the differences in sex means is statistically significant at the 
.05 level, with the one exception of the Accountant scale. 


Discussion 

The factors appear to reflect seven readily interpretable dimen- 
sions which are tapped by the Occupational. Each of these seven 
factors has a direct parallel in the male factor structure. However, 
a larger number of scales proved to be more factorially complex in 
the analysis of female responses than in the male analysis. In the 
female analysis 24 scales appear on two factors with loadings 
greater than .30, compared to 16 scales appearing twice for men. 
Ten scales appear on three factors in the female analysis while 
only two scales have three listed loadings in the male analysis. 
The factorial complexity is particularly evident for scales involv- 
ing some aspect of interpersonal relations. 

Even though the general factor structure is the same for men and 
women, the findings suggest a possible sex differenee in the de- 
terminants of the scale scores, That is, women may have a tendency, 
when responding to the Form D, to be influenced more by the inter- 
personal components of the various items; while men may be more 
strongly influenced by the impersonal or technical aspects of the 
items. 

The implications of the findings for practical test usage are 
similar to those for the male analysis. If one wishes to obtain an 
efficient cross section of an examinee's occupational interests, it 
would appear efficient to score the two or three scales which con- 
tribute the greatest variance to each of the seven factors. 

The similarity in factor structure for the two sexes suggests that 
the same profile may be used for reporting the results of both male 
and female responses to the Occupational. For the first four scales 
on each factor in Tables 1-8, a number is listed in parentheses. 
The number indicates the rank position of each scale on the parallel 
male factor. Inspection reveals a close correspondence between the 
heavily loaded scales for the two sexes. 

On the other hand the findings provide no basis for concluding 
that male and female results may be treated in identical fashion. 
The sex differences in the means and standard deviations reported 
in Table 9 suggest that such a procedure may be inappropriate for 
many purposes. From a normative point of view the differences de- 


192 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


predictive validity, е 

А factor analysis, based on correlational data, does not reflect 
absolute differences in interest levels. However, the interpretation 
of an individual examinee's Occupational results is conventionally 
an ipsative rather than a normative matter. Elevation and scatter 
of interests are usually emphasized to a greater extent than absolute 
level of interests. In such situations, a single profile, based on the 
D. R. scores accompanying the keys and arranged in correspond- 
ence with the factor structure described here, should prove satis- 
factory for both males and females. 


Summary 

A factor analysis of the 42 raw Scores obtained on the Kuder 
Preference Record—Occupational, Form D by 488 college freshmen 
females yielded a factor structure directly comparable to that ob- 
tained previously for a sample of males drawn from the same 
Population. Although the eight factors extracted in each analysis 
appeared to be directly parallel, certain nuances in the pattern of _ 
loadings for several scales were identified. The implications of the 
findings for practical test usage were discussed. 


REFERENCE Г 
rus R. E. and Baker, В. L. “A Factor Analysis of the Kuder 


ference Record Occupational, Form D." EDUCATIONAL AND 
PSYCHOLOGICAL MEASUREMENT, XXII (1962), 97-104. 


i 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


CONTROLLING THE EFFECTS OF "CLOUDING 
VARIABLES" IN MULTIVARIATE RESEARCH DESIGNS 


HARVE E. RAWSON 
AND 
SALOMON RETTIG 


Research Division, Columbus Psychiatric Institute and Hospital 
Ohio State University 


Іх multivariate educational and psychological research, it fre- 
quently becomes necessary to include a set of variables which add to 
the prediction of a given set of behaviors (criterion), but at the 
same time obscure the relationship between the independent varia- 
bles and the same criterion. Thus, for example, an upper-level col- 
lege student in an elementary college course may tend to seek out a 
more difficult task (more complicated subject matter in a term 
paper) than a first-year student. Because of the more difficult self- 
assignment, the upper-level student may obtain a poorer grade than 
a first-year student. If one wishes to obtain a correlation between 
student’s intelligence and grade performance in this course, one may 
wish to control for difficulty in task assignment, without removing 
the effect of year in college. Hence, year in college can be considered 
a “clouding variable” (DuBois, 1957) in that on the one hand it 
may contribute to the relationship between intelligence and grade 
performance, yet on the other hand it may tend to obscure the same 
relationship. 

In simple univariate research designs the necessary controls can 
usually be accomplished through a careful matching of subjects or 
through means of partial or semi-partial correlation. In more com- 
plex multivariate research designs, however, the investigator is often 
perplexed by the multitude of controls he must apply simultane- 
ously. Two approaches are commonly employed: 


(1) The investigator matches subjects on all the control variables 
493 


494 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


introduced. This task is extremely difficult, if not impossible, if the 
number of control variables is over two or three. Moreover, the 
available pool of subjects must be extremely large if the number of 
correctly matched subjects actually used is to approximate an ade- 
quate sample. 

(2) Along these same lines, the complex analysis of variance or 
co-variance design is often employed. The most serious disadvan- 
tage of this design is the number of control variables which can be 
introduced. Even controlling for three variables can lead to a disas- 
trously small n for some cells in certain research. If the number of 
cases in the cells is unequal, some proportionality correction must be 
made to the variance estimate. Moreover, the researcher is often 
faced with a bewildering multitude of interaction terms, especially 
if the number of variables to be controlled is four or more. Many 
researchers find that these complex interaction terms are difficult to 
interpret and more often than not add little to the meaningfulness 
of the research findings, 

In research where a series of criterion controls have to be intro- 
duced (such as sex, age, income, residence, etc.) in order to further 
clarify any research findings, the methodology should ideally meet 
the following criteria: 


1) The available pool of subjects should be maximally utilized 
80 as to obtain the largest number of cases possible and thereby 
reduce the sampling error, 

2) An estimate of the interaction! of the controlled variables with 
the criterion and with the independent variables, as well as with 
each other, should be available to the researcher, 

8) The interaction terms should be comparable to each other in 
the amount of criterion variance they explain. 

4) All of the above should be combined into a meaningful 
methodological framework which minimizes the amount of compu- 
tational drudgery. 

A special application of multiple semi-partial correlation analysis 
can meet all of the above criteria. In addition, this technique yields 
3) a separate estimate of maximally explained variance of the 
dependent variable by the independent variables; b) a separate 


1 By interaction, the authors make reference to th i 9 

] А : e correlation between tw 
variables (commonly called co-variance). The use of this term differs from 
that employed in the analysis of variance design, 


= 


RAWSON AND RETTIG 495 


estimate of the maximally explained variance of the dependent 
variable by the “clouding variables” alone; and c) an estimate of 
the maximally explained variance of the dependent variable by 
both the independent variables and the “clouding variables" 
combined. 


If a large multivariate design was employed involving n, inde- 
pendent variables (or predictor set 1), n, clouding variables (or 
predictor set 2), and a quantified sample of behavior (criterion) to be 
predicted in an experimental setting, the formula could be stated as: 


ККК УЛ лл means PENAS 
where R, = explained variance of the criterion; 
1,2,3, --- n, denote predictor variables; and 
1,2,3, --- n, denote clouding variables. 


The above formula would yield the maximally estimated variance 
of the criterion explained by a) the predictor variables, b) the clouding 
variables, and c) the predictor-clouding variable interaction, with 
the explained variance of the criterion due to the clouding variables 
alone partialed out. The investigator could then effectively answer 
the following question: what per cent of the variance of the criterion 
is explained by the predictors in association with the clouding 
variables, after the effect of the clouding variables alone is partialed 
out? 


Proposed Statistical Procedure 


An intercorrelation matrix of predictors, clouding variables, and 
the criterion is obtained (See Table 1). These intercorrelations are 
then treated as three separate sets of simultaneous regression equa- 
tions: 1) the regression of both the predictor variables and the 
clouding variables on the criterion; 2) the regression of the clouding 
variables on the criterion; and 3) the regression of the predictor 
variables on the criterion. These equations are then solved by means 
of any of several multiple regression techniques.” This first set of 
simultaneous equations will yield the coefficient of multiple de- 
termination (R°) for the criterion when all the original predictor 


2 The Wherry Test Selection Method is often advantageous in studies in- 
volving large numbers of predictors in that the amount of computation 1s 
drastically reduced. 


жашо тетин илүү 
(теүпйәлл—Аүуәәм) souepuezye чэлїчгу = тү 


c E d (3uv3so301d-uo N—4uvjsa3014) eouo19jo1id впоїйцә = OT 
. & (Texnjj—usqar]) eoueprsei uoreziwrog = 6 
EM D (ueursse[o1odd []—utwurgseijg)]ooqog ut 180, = $ 
E (әешә—әтеүү) XAG = л 
gd “ (Я 103994) Хуүүвлорү orurouoo = 9 
э т (Я 10308) Хуүелорү eAnw[ndruv]q-oAnvjro]dx;p = € 
ep 4 (а 203994) Ayywsoyy peoruegunq. = F 
i (О 10399) Aypeaogy APB = £ 
Hd (9 1030e) Areo впоїйцә = Z 
3 (y 10399) Атегоүү pinn = ү :әләцА 
к 
S в0:— — 31 
5 1£0— BUD C ce Ir 
B8 620° COL —- Tel —— 0I 
Е 20 — 622` 98 lii Ê Frac 6 
2 ge — бб IOS 10° 000° ee `8 
T= SE  960— 00° 890° O6F* = 2 
907'— Cer 98U—  *$€0—  S10— BOS oer = `9 
ye — ол`— ggg: 966 '—  v88£'— 97: $86 CHER UN сар ЕЧ 
" cee” SOL’ 162 — 61E: 9019-2810: — Bp УО 8881 rm д 
3 Tcl" Lo OBT = 29€ 2549] £08'— . ,Sg0:— 960— #69 — 9 = ЕУ 
m cr 810" 6£9' 891'-  00c E0 =  906— OST 69T'— 281° Sas © 
E LT 649° sor” 830° Ийе рб есе сүр 820` OLD bie 79007 AS Ее xy 
< 
S чошәўигу eI Ir or 6 8 5 9 g * € [4 т 
a ae ا و‎ SS 
a So[qEHEA 8шрпоуу SIOJOIPOIT onpeA үвлорү 
КОШЫ УУ ке к= ORA ج ڪڪ‎ — 
$ «TLD Jg] 401)0]9440249]UT 


7 RAWSON AND RETTIG — ^ 497 


variables and all the clouding variables are treated as potential 
predictors. Hencé,; «we have the coefficient РО PORE 
Another R? is computed, using only the clouding variables as potential 
predictors. This coefficient of multiple determination is: Rei NORIS 
To obtain an estimate of the amount of variance of the criterion 
explained by the predictors and the clouding variables combined 
with the variance of the criterion explained by the group of clouding 
variables removed, we subtract Ro *1,2,a<++n. from. Ej asp NM 
to get Ro «1,5, 52:270. This next R^ term can then be compared 
to the variance explained by the third equation Ro‘ 1g2,a,.«.n,. The 
difference between the two yields the amount of variance explained 
by the interaetion of the predictors with the clouding variables. 


An Illustrative Example: 


Some actual data obtained in a study of student's moral value 
judgments and simulated classroom cheating behavior will serve as 
an illustrative example of the proposed statistical design. 

Twenty students in an introductory psychology course served as 
subjects.” The criterion was a score of overestimation of knowledge 
under conditions of moral risks, i.e., "cheating" on an achievement 

© examination while under the risk of detection by the instructor or 
by other students. The predictors were factor scores derived from 
six orthogonal dimensions extracted from moral value judgments 
of college students, published elsewhere (Rettig & Pasamanick, 1959). 
Six additional measures of student socio-economic status were ob- 
tained and treated as clouding variables: sex, year in school, sociali- 
zation residence, religious preference, church attendance, and 
parental income. 

The following intercorrelation matrix was obtained (see Table 1). 
For this sample, one needs Rt isusscsionio, Which represents the 
variance estimate of the criterion due to the moral value factor 
Scores and their interaction with the clouding variables after the 
separate effect of the clouding variables on the variance of the 

| criterion has been removed. 
The Wherry Test Selection Method (Garrett, 1953) was utilized 
to compute .,,,,,,*, the maximal variance of the criterion ex- 


——— 

ih The authors would not advocate this method for a study involving less 
an 100 cases, The illustrative example with an № of 20 was used for ease of 

Presentation. 


498 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


plained by all the clouding variables. It was found that two of the 
socio-economic variables (year in school and parental income) ex- 
plained 2 maximum amount of the variance of the criterion (22.4 
per cent) (see Table 2). 

The same procedure was utilized to compute R,” 123450, the maximal 
explained variance of the criterion due to the moral value factors. 
It was found that 26.3 per cent (see Table 3) of the criterion variance 
(after shrinkage) could be explained using two of the predictor 


TABLE 2 
Wherry Shrinkage Table 
Clouding Variables Regression on the Criterion | 
Vi N-1 
m ETA K* kw Re Clouding Variable 
Zi N-m-1 
EE o. 
1 .108 .892 1.055 .941 .059 Year in School 
2 .198 .694 1.118 .776 .224 Parental Income 


* K, Kî, and Л? columns are eumulative entries, 


TABLE 3 


Wherry Shrinkage Table 
Predietive Variables Regression on the Criterion 


ya N-1 | 
m =з К* Kw fe Predictive 
Zi N-—mc-1 Variable 
1 .164 .836 1.055 882 118 Factor F 
2 77 .659 1.118 197 263 ^ FaetorE | 


*K, Кз, and Î columns aro cumulative entries, 


variables: Factor F (Economic Morality) and Factor E (Exploita- 
tive-Manipulative Morality). 

Reo! * vasasers0icitis) the total variance of the criterion explained by 
the moral value predictors, the clouding variables, and all inter- 
actions, was then computed (see Table 4). It was found that 50.2 
per cent of the criterion variance (after shrinkage) could be ex- 
plained using three predictive variables (predictors F, E, and A) 
and three clouding variables (year in school, parental income, and 
sex). It should be noted that neither predictor A nor sex was in- 


RAWSON AND RETTIG 499 


TABLE 4 


Wherry Shrinkage Table 
Predictive Variables and Clouding Variables Regression on the Criterion 


үа N-1 
m c Кя uu Kw Re Variable 
Zi N—mc-1 / 
1 .164 .836 1.055 .882 .118 Factor F 
2 „АМ .659 1.118 ‚787 .263 Factor E 
3 .090 .569 1.188 .676 .324 Year in School 
4 .082 .487 1.267 .617 .383 Parental Income 
5 .084 .403 1.357 ‚547 .453 Factor A 
6 .062 .941 1.461 .498 .502 Sex 


* K, Ёз, and Й? columns are cumulative entries. 


cluded in the respective estimates of the separate Бе *zsnonis OT 
Ro’ * 12460. Their inclusion in ej - 2545675910111 is due to the interaction 
between the predictor and clouding variables in regression on the 
criterion. 

Finally, to obtain R,” - 2356059101115), Which yields an estimate of 
the maximal amount of explained criterion variance by the pre- 
dictor variables and their interaction with the clouding variables 
(with the criterion variance explained solely by the clouding variables 
removed), Ro оси 12 was subtracted from Ro’ * 123456780101112 (50.2% xj 
22.4% = 27.8%). This figure was then compared with Ro ::21455 
(26.3%) to estimate the amount of “pureness” of the moral value 
predictive measures. In other words, the moral value predictive 
Measures and their interactions with the clouding variables in pre- 
dicting the criterion added only 1.5 per cent to the prediction of the 
criterion variance explained by the moral value predictors alone. 
Hence we can conclude that these clouding variables (socio-economic 
variables) play only a small role in the regression of moral values 
on the criterion, and that the moral value predictive measures are 
relatively pure of any socio-economic clouding effects. 


Summary 


A method of semi-partial multiple correlation is presented which 
enables the investigator to test the effects of any number of “cloud- 
ing variables” upon the independent and dependent variables. This 
method takes into account all interaction terms. The amounts of 
criterion variance which are explained by the predictors alone, by 


\ 
500 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 


the clouding variables alone, and by the predictors and the clouding 
variables, can be ascertained in a simple, meaningful fashion with a 
minimum of computational drudgery. These above terms enable the. 
investigator to obtain the amount of explained variance of any 
criterion by a set of predictors and their interaction with a set К 
clouding variables, after having partialed out the effect of the 
clouding variables alone on the criterion. 

` REFERENCES | 


DuBois, P. H. Multivariate Correlational Analysis. New York: | 

. Harper and Brothers, 1957. : 

Ezekiel, M. Methods of Correlation Analysis (Second Edition). 
New York: John Wiley & Sons, 1941. 

Garrett, H. E. Statistics in Psychology and Education (Fourth 
Edition). New York: Longmans, Green and Company, 1953. 
Rettig, S. and Pasamanick, B. “Changes in Moral Values Among 
College Students: A Factorial Study.” American Sociologica 

Review, XXIV (1959), 856-863 : 


| 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


SOCIAL DESIRABILITY AND THE FACTORIAL 
INTERPRETATION OF THE MMPI! 


ALLEN L. EDWARDS ax» CAROL J. DIERS 
The University of Washington 


Messıcx and Jackson (1961) have reviewed eight factor analyses 
of Minnesota Multiphasie Personality Inventory (MMPI) scales. 
| For each factor analysis, they calculated Spearman rank correla- 

tions between the proportion of items keyed True in each scale and 

the loadings of the scale on the largest factor. The rank correlations 
ranged in magnitude from approximately .52 to .91, and Messick 
| and Jackson (p. 300) state: “These strikingly consistent findings 
indicate that in most of these studies the largest factor on the 

MMPI is interpretable in terms of acquiescence.” 

Six of the eight studies reviewed by Messick and Jackson involved 

à limited number, 11 to 15, of the MMPI seales. Only two studies, 

one an unpublished study by Slater (1958) and one by Kassebaum, 
` Couch, and Slater (1959), involved a substantial number of MMPI 

Scales, Slater's factor analysis was based upon 43 scales and the 
|; Kassebaum, Couch, and Slater analysis was based upon 32 scales. 
* For these two studies, the rank correlations between the first factor 
loadings and the proportion of items keyed True were .72 and 2, 
respectively, 

Edwards and Heathers (1962) obtained a product-moment cor- 
relftion of — 93 between the first factor loadings reported by Kasse- 
„baum, Couch and Slater and the proportion of items keyed for 
Socially desirable responses in the MMPI scales involved in their 
| Study. On the basis of this correlation, Edwards and Heathers 


e ED, 
This research was supported in part by Research Grant M-4075 from the 
National Institute of Mental Health, United States Publie Health Service. 


ү ‘i 


502 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


conclude that the first factor loadings of the MMPI can be inter- 
preted in terms of social desirability considerations. 

We thus have two interpretations of the first factor loadings of 
MMPI scales which, at least on the surface, appear to be in opposi- 
tion. Evidence for the interpretation of the first factor loadings in 
terms of acquiescence is based upon the correlation between an 
index of acquiescence, the proportion of items keyed True in a scale, 
and the factor loadings. Evidence for the interpretation of the first 
factor loadings in terms of social desirability is based upon the 
correlation between an index of social desirability, the proportion 
of items keyed for socially desirable responses in a scale, and the 
factor loadings. 

Both Hanley (1961) and Edwards (1962) have pointed out that 
the index of acquiescence and the index of social desirability are 
often confounded. For example, if a majority of the items in a scale 
are keyed True and if, at the same time, these items have socially 
desirable (or undesirable) scale values, then there is no way to 
isolate the influence of the tendency to acquiesce from the tendency 
to give socially desirable responses. Similar considerations apply to 
seales in which a majority of the items are keyed False and when 
these items also have socially desirable (or undesirable) scale 
values. Thus, one possible explanation of the Messick and Jackson 
correlations is that they are partially or wholly the result of the 
confounding of the acquiescence keying of the MMPI scales with the 
social desirability keying rather than the result of acquiescence per 
se. The present study was undertaken to investigate this possibility. 

Let us assume, for example, that scores on MMPI scales under 
Standard (S) instructions are influenced by both social desirability 
and acquiescent tendencies and that these tendencies are reflected 
in the factor loadings of the scales in such a way that the loadings 
DNE correlated with both the index of social desirability and the 
index of acquiescence. Now suppose that the MMPI is admin- 
istered under instructions to give Socially Desirable (SD) responses: 
If the subjects accept the instructions, then they should respond 
True or False to an item primarily in terms of its judged social 
desirability scale value, i.e., if the item is judged to have a socially 
desirable scale value, they оша respond True, whereas if the item 
is judged to have a socially undesirable scale value, they should 
respond False. Since there are individual differences in judgmen 


ts | 


| 


EDWARDS AND DIERS 503 


of social desirability scale values, we may still expect some varia- 
tion in the subjects’ scores on the various MMPI scales under SD 
instructions. It seems reasonable to believe, however, that under 
these instructions acquiescent tendencies will be minimized. Simi- 
larly, if the MMPI is administered under instructions to give 
Socially Undesirable (SUD) responses, then it seems reasonable 
to believe that acquiescent tendencies under these instructions will 
be minimized also. 

If the first factor loadings under S instructions are interpretable 
in terms of acquiescence, then there should be little or no relation- 
ship between the index of acquiescence and the factor loading under 
SD and SUD instructions. On the other hand, if the relationship 
between the factor loadings and the index of acquiescence under S 
instructions is the result of the confounding between the acquies- 
сепсе keying and the social desirability keying, then we should 
expect little or no change in the relationship under SD and SUD 
instructions. 


Method 


The MMPI was administered to 150 male students at Western 
Washington College of Education under S instructions. Approxi- 
mately one week later, 120 of the original members of the 8 group 
Were administered the MMPI under SD instructions. Another inde- 
Pendent group of 150 males at Central Washington College of 
Education was administered the MMPI under SUD instructions.? 

For each of the three sets of instructions, MMPI scores were 
obtained on 58 MMPI scales. Intercorrelations of the scales were 
then obtained separately for each set of instructions and the three 
Tesulting correlation matrices were factor analyzed by the method 
of principal components. The present study is concerned only with 
the loadings on the first or largest unrotated factor, since it is the 
interpretation of this factor in terms of acquiescence and social 
desirability that is of interest. The proportion of the total variance 
accounted for by the first factor under S, SD, and SUD instructions 
was .38, 46, and .42, respectively. 


د 

* We are indebted to Dr. Charles W. Harwood at Western Washington Col- 
lege of Education and Dr. Eldon E. Jacobsen at Central Washington College 
of Education for assistance in obtaining the MMPI records. 

* The proportion of the total variance accounted for by the second factor 
Under 8, SD, and SUD instructions was 10, 10, and .14, respectively. No sub- 


504 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Using Heineman's social desirability scale values for the MMPI - 


items, as given by Dahlstrom and Welsh (1960), we obtained the 
proportion of items keyed for socially desirable responses in each 
of the scales. In determining whether or not an item was keyed for a 
socially desirable response, we arbitrarily considered items with 
scale values on the socially desirable side of the neutral point on the 
psychological continuum as having socially desirable scale values 
and those on the other side of the neutral point as having socially 
undesirable scale values. An item was then considered to be keyed 
for a socially desirable response if it had a socially desirable scale 
value and was keyed True, or if it had a socially undesirable scale 
value and was keyed False. The proportion of items keyed for 
Socially Desirable Responses (SDR) in each scale was used as our 
index of social desirability. Similarly, for each scale we found the 
proportion of items keyed for True Responses (TR) and this pro- 


portion, following Messick and Jackson, was used as an index of 
acquiescence. 


Results and Discussion 


Table 1 gives the product moment correlations of the two indices, 
SDR and TR, with the loadings of the scales on the first factor 
under each set of instructions. Under S instructions the TR index 
correlates —.73 with the first factor loadings of the 58 MMPI scales. 
Messick and Jackson obtained a similar correlation of .72 between 
the TR index and the first factor loadings for 43 MMPI scales under 
S instructions. We note, however, that the relationship between the 


TR index and the first factor loadings changes very little under SD 


TABLET 


Correlations of the Proportion of Items Keyed for Socially Desirable Responses 
(SDR) and of the Proportion Keyed for True Responses (TR) with 
the First Factor Loadings under Three Sets of Instructions 
eo 


Index 
——— Ал з шш ____ 7 
Instructions SDR TR 
Standard 8.3 
Socially desirable .92 —.60 
Socially undesirable м —.06 


————— ÁÁÁH ا‎ 


1—8 
sequent factor, under any of the sets of instructi ted for more tha 
08 of the total variance, instructions, accounted for m 


EDWARDS AND DIERS 505 


(r = —.69) and SUD (r = —.66) instructions. We would argue 
that under SD and SUD instructions, acquiescent tendencies should 
be minimized and that the TR index should, therefore, have little 
relationship with the factor loadings under these instructions. That 
the relationship remains relatively unchanged under SD and SUD 
instructions is, we believe, the result of the fact that the TR index 
is confounded with the SD index. 

Under all three sets of instructions the SD index correlates highly 
and positively with the first factor loadings. These correlations show 
that the first factor loadings can be interpreted in terms of social 
desirability considerations, regardless of the conditions under which 
the MMPI is administered. 

The factor loadings under each set of instructions are given in 
Table A.* That these factor loadings are quite stable under different 
instructions is shown by the fact that the loadings under SD and 
SUD instructions both correlate .97 with the loadings under 8 
instructions. The correlation between the SD and SUD loadings is 
.99. 

Further evidence that the first factor loadings are interpretable 
in terms of social desirability is available from the correlations of 
the loadings of the scales on this factor with the zero-order correla- 
tions of the scales with the Social Desirability Scale (SDS). The 
SDS was developed by Edwards (1957) as a measure of the 
tendency of subjects to give socially desirable responses. The scale 
consists of 39 MMPI items, all of which are keyed for the socially 
desirable response. Scores on the SD scale were correlated with 
Scores on each of the other MMPI scales under each set of instruc- 
tions. These zero-order correlations were then correlated with the 
loadings of the scales on the first factor. These correlations are 
shown in Table 2. 

As the correlations given in Table 2 show, the zero-order correla- 
tions of the MMPI scales with the SDS under any one of the three 
Sets of instructions are highly related to the first factor loadings of 
the scales under any of the three sets of instructions. The zero-order 
correlations of the SDS with the other MMPI scales are themselves 


— 

‚ “A 1-раде table listing the 58 scales and giving the unrotated factor load- 

ings under the three sets of instructions has been deposited with the American 
ocumentation Institute. Order Document No. 7196, remitting $1.25 for 35-mm. 

microfilm or $125 for photocopies from Chief, Photoduplication Bervice, 

Auxiliary Publications Project, Library of Congress, Washington 25, D. C. 


500 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT  - 


TABLE 2 


Correlations between the Zero-order Correlations of MM PI Scales with the Social 
Desirability Scale under Three Sets of Instructions with the First Factor 
Loadings of the Scales under Three Sets of Instructions 


8 SD SUD 
Ts .99 .96 96 
Tsp .97 1.00 99 
Tsup .96 .97 99 


tained under SD and SUD instructions. The correlations under SD 
instructions correlate .98 with those under SUD instructions. 

The evidence we have presented shows that the first factor load- 
ings of the MMPI scales can be predicted quite accurately from 
either the proportion of items keyed for socially desirable responses 
or from the zero-order correlations of the scales with the 39-item 
SDS. The relationships hold regardless of whether the MMPI i 
administered under $, SD, or SUD instructions. We believe th 
results support the interpretation of the first factor loadings in term 
of social desirability rather than in terms of acquiescence. 

It might be argued, however, that our instructions to the subject 
to give SD and SUD responses were relatively ineffective and tha 
they were responding in terms of acquiescent tendencies, despite tht 
instructions given them. If this were actually the case, then the rela: 
tionship between the TR index and the first factor loadings should 
be, as we found, much the same for all three sets of instructions 
This argument can easily be discounted in view of other evidenci 

For each MMPI scale we obtained the number of items keyed 
True, the number keyed False, and the number keyed for SD anc 
SUD responses. If the subjects accepted the instructions to give SD 
responses, then, within the limits of error of judgment of the socia 
desirability scale values of the items, the mean score on each 8 ale 
under SD instructions should approach the number of items in th 
scale keyed for SD responses. On the other hand, if they failed t 
accept the instructions and were responding on the basis of Аб 
quiescent tendencies, then the mean score should approach th 
number of items keyed for True responses, Similarly, under SUP 
instructions the mean should approach the number of items key@ 


EDWARDS AND DIERS 507 


for SUD responses. Table 3 shows the correlations between the 
number of items keyed for True, False, SD, and SUD responses and 
the means of the scales for each set of instructions, 

The correlations shown in Table 3 clearly support our belief that 
the subjects did, in fact, respond to the items in aecordance with the 
instructions given them and not in terms of acquiescent tendencies. 
The correlation of .78 between the number of items keyed True and 
the means under SUD instructions can be accounted for, we believe, 
in terms of the relatively high correlation of .74 between the number 
of keyed True items in the scales and the number keyed for SUD 
responses, Similarly, the number of items keyed for False responses 
in the scales correlates .48 with the number keyed for SD responses 
and this accounts, we believe, for the fact that the number of items 
keyed False correlates with the means under S and SD instructions. 
Of interest is the fact that the number of items keyed for SD re- 
sponses also correlates quite highly with the means of the scales 
under 5 instruction. 

It has been suggested by Hanley (1956) and Edwards (1957) that 
if a scale contains a large proportion of items with neutral social 
desirability scale values, scores on it are less likely to be influenced 
by social desirability tendencies. With social desirability tendencies 
minimized, we might expect acquiescent tendencies to become more 
influential, Thus, if a scale contains a large proportion of neutral 
items, acquiescent tendencies may be of greater importance than 
Social desirability tendencies. This point seems to be accepted by 
Messick and Jackson in their review. The point is also relevant to 
our interpretation of the first factor loadings in terms of social 
desirability, 

If the first factor loadings involve primarily aequiescent tenden- 


TABLE 3 


Correlations of the Number of Items Keyed T, F, SD, and SUD with 
MMPI Means under Three Sets of Instructions 


Means 
8 SD SUD 
NNNM) E 7 
No. items keyed T 17 —.11 ‚78 
No. items keyed F .53 .58 .00 
No. items keyed SD .82 .98 —.43 
No. items keyed SUD —.12 —.47 .98 


nD аши н DRM 


508 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


cies and if these tendencies are most operative when scales have a 
large proportion of neutral items, then the proportion of neutral 
items in the scales should correlate positively with the absolute 
values of the first factor loadings. On the other hand, if the first 
factor loadings involve primarily social desirability tendencies, then 
the proportion of neutral items in the scales should correlate nega- 
tively with the absolute values of the first factor loadings. 

For each of the MMPI scales we obtained the proportion of items 
falling in the neutral interval on the social desirability continuum 
and this index was then correlated with the absolute values of the 
factor loadings under all three sets of instructions. These correla- 
tions for 8, SD, and SUD instructions are —.42, —.46, and —.44, 
respectively. The fact that these correlations are all negative is 
consistent with our interpretation of the first factor loadings in 
terms of social desirability. 


Summary 


First factor loadings on 58 MMPI scales were obtained under 
Standard (8) instructions, under instructions to give Socially 
Desirable (SD) responses, and under instructions to give Socially 
Undesirable (SUD) responses. The first factor loadings were stable 
over the three sets of instructions and highly correlated. The first 
factor loadings of the scales under 8, SD, and SUD instructions 
correlated .89, .92, and 94, respectively, with the proportion of items 
keyed for socially desirable responses in the scales. The results indi- 
cate that the first factor of the MMPI can be interpreted in terms 
of social desirability, 


REFERENCES 


geni ad da G. and Welsh, G. 8. An MMPI Handbook. Minne- 
apolis: University of Minnesota Press, 1960. В 
Siete г к е каш Desirability Variable in Personality 
ent and Itesearch. New York: Dryden, 1957. 
Edwards, A. L. "Social Desirability or осоо in the MMPI? 
A Case Study with the SD Scale." Journal of Abnormal @ 
Social Psychology, LXITI (1961), 351-359. 
EET. DEN Has Louise B. "The First Factor of the 
ГРІ: Social Desirabilit Р 1 ы 
sulting Psycholo , XXYI 11962) 99-100 54 iu: 
Hanley, 5 DM рейін апа Responses to Items from Ee 
MÀ ales: D, Sc, and K.” J ied Psychology: 
XL (1956), 324-398. ournal of Applied Psyc 


ә. 
= dis — 


EDWARDS AND DIERS 509 


Hanley, C. “Social Desirability and Response Bias in the MMPI.” 
Journal of Consulting Psychology, XXV (1961), 13-20. 

Kassebaum, G. G., Couch, A. S. and Slater, P. E. “The Factorial 
Dimensions of the MMPI.” Journal of Consulting Psychology, 
XXIII (1959), 226-236. 

Messick, S. and Jackson, D. N. “Acquiescence and the Factorial 
Interpretation of the MMPI.” Psychological Bulletin, LVIII 
(1961), 299-304. 

Slater, P. E. ^Personality Structure in Old Age." Progress Report, 
1958, Age Center of New England, Project M-1402, National 
de of Mental Health. Cited in Messick and Jackson 

1961). 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXI, No. 3, 1962 


TEST RELIABILITY—A CORRECTION 


FREDERIC M. LORD 
Educational Testing Service 


Iw several articles (Lord, 1955a, p. 8; 1955b, 19592), the writer 
has pointed out that Var za, the estimated squared standard error 
of measurement of examinee a for randomly parallel tests, is, when 
averaged over all examinees, identically the same as the squared 
standard error of measurement, S. E2,,,,, computed by the usual 
formula from the Kuder-Richardson Formula-21 reliability coeffi- 
cient, rai: 


N 
Өш. e RN a M > Var 2, @) 
а=1 


where N is the number of examinees and s, is the standard deviation 
of the observed scores (z). 

This result is a mathematical identity, as shown most clearly in 
the second paper (Lord, 1955b, p. 329 and p. 334). It follows im- 
mediately that 


Y > vara, 
cee pet | 

Equation (2) is also a mathematical identity. It provides justi- 
fication for the use of rs, as а measure of reliability in situations 
where every examinee takes a different test or is rated by a different 
rater. It appears to be wrong, however, to assert, as the author has 
done, that this relationship justifies ra, as a reliability coefficient in 
those situations where all examinees take the same test. The diffi- 
culty, clearly perceived and called to the author's attention by 
Rajaratnam and Cronbach, is as follows. 


(2) 


Та = 1 


511 


512 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The errors of measurement represented in the Vár z, in 
numerator of (2) include all discrepancies between the observe 
score, Ta, and the true score (which for the present case is the medi 
score of examinee a on all possible randomly parallel tests). In par 
tieular, these errors inelude discrepancies arising because different 
randomly parallel test forms happen to be exceptionally hard Ql 
easy. This last source of error, however, is not represented in th 
term s,* in the denominator if 8,2 is calculated from the data ob 
tained on just one test form. It is difficult to provide a useful intere 
pretation for the ratio between such a numerator and denominator 

Appropriate reliability coefficients may be obtained from the 
familiar definition 


SE. 
s; 
only if one ensures that the errors represented in the numerator are 
also represented in the denominator. In the case of randomly parallel 
tests, one possible modification of the denominator leads to а пей 
reliability coefficient (Lord, 1959b, eq. 47; Lord, 1959c, eq. 6j 
Rajaratnam, 1960), differing slightly from 721, valid in certain spe 
cific situations. 

It is, of course, clear that various kinds of reliability coefficients 
may be formulated, depending on which errors of measurement an 
to be taken into account (e.g., see Ebel, 1951). Different coefficients 
will be appropriate under different circumstances. 1 


fa = 1 — 


REFERENCES 
Ebel, В. L. “Estimation of the Reliability of Ratings.” Psyc 2 
metrika, XVI (1951), 407-424. d ae 1 E 
Lord, F. M. “Sampling Fluctuations Resulting from the Sampling 
of Test Ttems. ! Psychometrika, XX (1955), 1-22. (a) 
Lord, F. M. "Estimating Test Reliability.” Educational and Psy 

chological Measurement, XV (1955), 325-336. (b) 
tnde RE Eus Tests and Lyerly’s Assumption 19 
- Richardson ul S ik - 

Н (1959), 175-178: (a) ormula (21).” Psychometrika, 
ord, F. M. “Statistical Inf ” Psycho 
d vetrine, XXIV (1959). 138 (bj about True Scores.” Psy 
ord, F. M. pproach to Menta gk metriki 
: XXI m (1959), 282-302. (0) ental Test Theory.” Psycho’ | 
Rajaratnam, N. "Reliability Formulas for Independent Decisio 


Data When Reliability Data n home 
XXV (1960), 261-271. i cesa ү 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vot. XXII, No. 3, 1962 


MULTIPLE HIERARCHICAL CLASSIFICATION OF 
INSTITUTIONS AND PERSONS WITH REFERENCE TO 
UNION-MANAGEMENT RELATIONS AND 
PSYCHOLOGICAL WELL-BEING 


LOUIS L. McQUITTY 
Michigan State University 


Tuis paper is concerned primarily with efforts to develop statisti- 
cal methods for assessing individual differences in mental health; 
it shows, nevertheless, that the methods can be applied to develop- 
ing a taxonomy of institutions, as well as people, illustrating this 
fact by application of the method to companies which differ in 
characteristies of union-management relationships. 

If statistics are to be developed for some such purpose as the 
assessment of mental health status, then it is helpful to have sub- 
stantive insights; we need to have theories as to the nature of indi- 
vidual differences in mental health. We are not limited to one 
theory; we can have many theories and they can be tested suc- 
cessively, By means of these empirical tests, we learn which theory 
or set of theories is the most promising. ) 


Substantive Theory Versus Statistical Methods 


There is, however, one basic requirement if statistical methods 
are to be developed for assessing individual differences in mental 
health; the theories must be relatively exact and definitive. If the 
theories do not meet these requirements, they cannot guide the de- 
velopment of statistical methods; statistical methods are exacting 
and their development for testing substantive theories must derive 
from theories which are themselves exacting and specific. 

One of the advantages of translating a substantive theory for 
Statistical analysis is that it forces careful and exact statement of 


513 


514 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the theory. However, the theory should not be relegated to a sec- 
ondary role; the theory should determine the method rather than 
the available methods determining the theory. 

Mental measurement in psychology assumes that a response 
means the same thing irrespective of who gives it and in what 
combination of other responses it occurs. This assumption seems 
to be reasonably appropriate in some psychological areas. There is, 
for example, a rather large area of objective reality in which most 
adults agree; a penny is a penny and is recognized as such by adults, 
Therefore, when a young child responds “penny” upon being shown 
one, it is legitimate to conclude that his response is indicative of 
intelligence above a certain minimum. 

Even though the response, penny, may reflect a certain amount 
of intelligence, the failure to respond in this manner does not neces- 
sarily mean the lack of the prerequisite amount of intelligence; we 
know that it is possible to teach (condition) even a bright child to 
respond in no way other than fearfully to a penny. Furthermore, à 
child can be differentially trained so that a penny will elicit either 
the vocalization of penny or the reaction of fear, depending on how 
itis presented. 

The above example illustrates that the response depends on the 
training in relation to the context of presentation of stimuli. Conse- 
quently, responses appear to have the potentiality of assessing indi- 
vidual differences in experiences, However, they cannot yield this 
wealth of individual differences in experiences if a response must 
always be interpreted to have the same meaning irrespective of who 
gives it; responses can be interpreted in this manner only if all 
people have had homogeneous experiences, When we use a penny in 
measuring intelligence, we assume that all children have had reason- 
ably similar experience with pennies, By the assumptions inherent 
In Measurement approaches we reduce the use of responses as indi- 
cators of individual differences in experience. 

Responses which have the same meaning across all persons are 
quantitative responses. Qualitative responses, on the other hand, 
reflect various meanings, depending on who gives them. We have at 
least two options with respect to qualitative-type responses. We 
van discard them as inappropriate to psychological measurement; 
this is customary practice and probably means that we are discard- 
ing those responses which are potentially most fruitful for under- 


LOUIS L. McQUITTY 515 


standing behavior related to individual differences in experiences. 
On the other hand, we can attempt to develop numerical methods 
which can be applied to reflect individual differences in experiences. 


A Theory of the Personality Structure of Mental Illness 


Many kinds of mental illness are presumed to develop out of 
rather atypical patterns of experience. Responses which reflect these 
experiences vary with the experience which individuals have had. 
The responses do not, however, vary in a one-to-one relationship 
with the patterns of experience. There is no one crucial response 
such that if it is present or absent a particular pattern of experi- 
ences is also present or absent. Instead, there are patterns of re- 
sponses which do hold a one-to-one relationship with the patterns of 
experience. For every pattern of experience there is presumed to be 
a pattern of responses, and if a particular pattern of responses is 
present then both the pattern of experience and its associated mental 
illness is also present. 

A problem for objective appraisal of mental health is to discover 
statistical methods which will assist in isolating patterns of re- 
sponses which hold a one-to-one correspondence with classes of 
mental illness. 

In an effort to develop a statistical method to assist in objective 
appraisal of mental illness, we are confronted with the problem that 
the classes of mental illness themselves are not known in any ob- 
jective sense. 

One approach to our problem is to attempt to discover the classes 
of mental illness out of the patterns of responses which are pre- 
sumed to be associated with them. We must discover those patterns 
of responses which mental patients give and non-mental patients 
do not give; each such pattern will represent a class of mental 
illness, $ 

The next problem is to specify the responses in terms of which 
mental illness expresses itself, A complication here is this; there is 
no one set of responses in terms of which all classes of mental illness 
express themselyes. One class of mental illness expresses itself in one 
set of responses, and another class expresses itself in another set, 
with various degrees of overlap between the responses which are 
pertinent to the various classes. 

Normal people are presumed to reflect normal personality types 


516 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in the same fashion as patients reflect classes of mental illness. For 
each type of normality, there is presumed to be a set of responses 4 
which reflects it, and the many sets for the many types overlap in | 
various degrees. 

Not only do the healthy sets overlap with one another as do the 
pathological sets, but the healthy sets overlap with the pathological 
ones. Whether a response reflects mental health or mental illness 
depends on the set of other responses with which it occurs; a re- | 
sponse reflects mental health or mental illness depending on whether 
it is expressed in a healthy or pathological set. Consequently, re- 
sponses which are indicative of mental health states are qualitative — 
as opposed to quantitative; the diagnostic indicant to be assigned 
to them depends on the patterns in which they occur; the interrela- 
tionships are other than linear, 


A Problem in Objective Assessment of Mental Health 


Our problem in objective assessment, of mental health is straight- 
forward. It is first to define the universe of responses which pertain 
to any one or more pathological sets, Stated in other words, it is to 
define all of the responses which are sometimes indicative of mental 
pathology. For this purpose we must depend on the insights of the 
clinician. 

Starting with our definition of pathological responses, the next - 
task is to develop test items designed to elicit pathological responses 
when appropriate. This is a crucial problem; how do we abstract 
qualitative data for numerical analysis without losing the differ- 
ential values presumed to be inherent in the appropriate setting (the 
diagnostic, clinical or life situation)? The statistical methods are of 
little or no value unless this problem is adequately solved. It is not, | 
however, the problem to which this paper is addressed. The tes 
items must be carefully selected so that they are representative of 
the universe of pathological responses, 

The test items are to be used to elicit responses from both pa- 
tients and normals. The subjects are to be classified into categories 
by statistical methods on the basis of their responses to the items. 

For every subject, there would be a pattern of responses. These | 
are called individual patterns, for no person is likely to agree with 
any other person on all responses, 

In analyzing these responses, we must recall that individual pat | 


LOUIS L. McQUITTY 517 


terns, even of patients, may reflect more normality than pathology; 
this is because the universe of pathological responses is far larger 
than those reflected by any one class of mental illness. Consequently, 
in terms of all of their responses, most patients might be found to be 
more like normals than patients. We cannot solve our problem of 
mental health appraisal by classifying subjects in terms of their 
predominant patterns only. The question which we must ask is this: 
Is there any significant pattern of responses in which a person is 
more like patients than normals? If yes, then he is diagnosed by the 
objective methods to be a patient. The test of the method is to check 
the objective diagnoses against clinical judgment. If they agree 
reasonably well with clinical judgment, then they can possibly be 
further improved through clinical insights and statistical procedures 
until the objective method is superior to clinical judgment. 

Clinical insights, data collection, and statistical methods must 
all three work in close collaboration; both data collection and sta- 
tistical analysis develop out of a substantive theory; results from 
them yield new insights and a refined theory. Statistical methods 
are revised accordingly, reapplied to appropriate data, with addi- 
tional insights and revisions, and thus the research continues ina 
long and laborious fashion with many blind alleys, a few minor suc- 
cesses, and sometimes even a major “breakthrough.” 

Our purpose is to classify every subject first in terms of his pre- 
dominant patterns and then again in terms of his next most pre- 
dominant patterns, using only the responses not involved in the first 
classification, and continuing this process until every subject has 
been classified in terms of all of his responses. 

It is hypothesized that almost every patient will classify eventu- 
ally into a category which is nearly all patients, and few if any 
normals will classify into such categories. These hypotheses will 
hold, however, only if the analysis is performed on several hundred 
subjects. Many subjects are required because both many classes of 
mental illness and many types of normality are presumed. If all of 
both of them are not well represented in the subjects studied, then at 
least some of the subjects will be forced into artificial categories by 
the analysis, and the hypotheses cannot then be expected to apply. 


The Method of Analysis 
The method of analysis is an elaboration of Hierarchical Syn- 


518 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


drome Analysis (McQuitty, 1960). This method classifies subjects 
into categories which are based on their predominant patterns of 
responses. The method is here elaborated so that it classifies each | 
subject not only in terms of his most predominant patterns, but then | 
again in terms of his next most predominant patterns, using only 
response patterns not utilized in the first classification; it continues 
in this vein until every subject has been classified in all of his re- 
sponses. The method is called Multiple Hierarchical Syndrome 
Analysis. 

After a matrix of interassociations between subjects has been 
computed, the method requires no mathematies more complicated 
than recognizing which of two numbers is larger and elementary 
subtraction. 

The method is here developed in terms of patterns of characteris- 
tics of industrial companies. The purpose is to classify the com- 
panies first into their most predominant patterns and then into their 
next most predominant patterns, Industrial companies, rather than 


TABLE 1 
Agreement Scores between Companies* 


Steps in 

the Analysis 19 1 2 2 4 4 55 7 7 
Û Or d E AB EF A 
B 


3 3 
Companies AM Bi O D E F OH D H F CD GH E 


ABCDEFG - 


— === 
* Data from MeQuitty, 1054. 


** Capital letters ref. > " 7 
of them on the left peck раа the numbers above thom at the top of the table (and (0 


was classified, ble) specify the order in which every company (or group И 


LOUIS L. McQUITTY 519 


individuals, were chosen for this study because it is thought that 
there are many fewer patterns for institutions than for individuals; 
the method can be realistically illustrated with many fewer cases. 


Isolating the Predominant Patterns 


The largest square of Table 1 reports interrelationships between 
eight industrial companies, two construction companies labeled A 
and B, trucking—C and D, grain processing—E, metal products— 
F and garment manufacturing—G and H. Hach company was ap- 
praised as either above or below average on each of 82 variables in 
the field of union-management relationships. 

The index of association between the companies is an agreement 
score. If two companies are both above average on an item, this is 
one agreement, Also, if they are both below average on an item this 
is an agreement. But, if one is above and the other is below average 
on an item this is not an agreement. The agreement score between 
two companies is the number of items on which they agree. 

There are two versions of Hierarchical Agreement Analysis: (a) 
the Replacement and (b) the Self-Checking Version. The Replace- 
ment Version is less dependable than the Self-Checking Version 
when data are not well structured. However, the data at hand are 
well structured in the case of the predominant patterns but less well 
structured in the case of the next most predominant patterns. There- 
fore the Replacement Version will be used for isolating the pre- 
dominant patterns, followed in turn by the Self-Checking Version 
for isolating the next most predominant patterns. 

The first step in both versions is to underline the largest entry in 
each column of the original matrix. This has been done for the 
largest square of Table 1. Then one selects the largest entry in the 
matrix. In this case, it is 29 and mediates between Companies A and 
B. Companies A and B are then grouped as a type to form Row and 
Column AB. 

The problem is to determine the entries for the new column and 
tow. How do we determine for example how much Company C has 
in common with Type AB? This is accomplished by application of 
the classification asumption. We assume that Companies AB and C 
have as much in common as the pair with the least in common. The 
pairs are AB, AC, and BC. We already know that AB has more in 
common than either AC or BC because it has more in common than 


520 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
any other pair of the original matrix. By reference to the table we 


can determine which pair AC or BC has more in common; we refer - 
to the intersections of Columns A and B with row C, and we find - 


that AC and BC have agreement scores of 16 and 17, respectively. 
Consequently, pair AC with an agreement score of 16 has least in 
common, and it is therefore assumed that ABC have an agreement 
score of 16. Sixteen is entered as the agreement score in both cells: 
Column AB-Row © and Row AB-Column C. The other entries of 
Column and Row AB are determined in the same fashion. 

The next step in the Replacement Version is to eliminate both 
Rows and Columns A and B from further analysis; they are re- 
placed by Row and Column AB. This action defines a revised matrix, 
which is enclosed by the intermediate size square of Table 1. 

The matrix of the intermediate size square is analyzed in the 
same fashion as was the original matrix to yield the matrix of the 
smallest square shown in Table 1. Each time a matrix is analyzed, 
the next matrix is smaller by one column and one row. Consequently, 
if there are n variables in the original matrix there are n-1 steps in 
the analysis, and each step in the analysis is performed in the same 
manner as the first step, outlined above. 

It is helpful to mention one precaution in connection with the 
analysis; the successive matrices for analysis do not form squares 
in the sense just illustrated unless the original matrix is arranged 80 
that the largest entry mediates between the first two variables, the 


largest entry for the remaining variables between the next two, et. — 


It is not, however, necessary to have them arranged in this order. 
The arrangement facilitates description of the analysis, and they 
were appropriately arranged for the first three matrices, but not 


for the fourth matrix. Table 1 shows how the fourth matrix was | 
analyzed in the same fashion as described except for the requirement ` 


to arrange the entries in a square, 


Table 1 carries the analysis to its logical conclusion; the process 
was continued until all companies were classified together. We now 


ask whether or not it is appropriate for our purpose to carry the 


analysis this far, 
=e P ef. the classification is to maximize the number of 
characteristics which are subsumed within the categories chosen. In 


our present approach we are classifying responses. A response is 8D. 4 
answer to a test item by an individual (or in this case a company) | 


| 


LOUIS L. McQUITTY 521 


answers to two items by one individual represent two responses, as 
does also one response to one item by two individuals. Consequently, 
the number of responses classified by any eategory in the above 
analysis is the number of individuals in a category multiplied by the 
agreement score of the category. For example: A and B were classi- 
fied together because they agree on 29 items (Table 1). The classi- 
fication capacity of this category is therefore 2 X 29 = 58. The first 
thing to notice is that 58 is larger than 32, the number of items in the 
test used in the study. If this result had not been obtained, then the 
companies would have agreed in no more than half the items, and 
the classification would not have been justified by this result alone. 
The classification capacity for categories CD, GH, EF, ABCD, and 
EFGH are 52, 48, 42, 64 and 52, respectively. Throughout all of 
these classifications, each company moves into a category classify- 
ing additional responses. Company О, for example, as an individual 
company is described in terms of 32 items; only 32 responses are 
involved. It then joined company D in a eategory where 48 re- 
sponses are classified, and finally C, together with D, joined AB 
where 64 responses are classified. A similar progression holds for 
each of the seven other companies. 

The next step in the analysis brings all of the companies together, 
where they are reported by the method to agree on six items, and, 
since there are eight companies, the classification capacity of this 
category is 6 X 8 = 48, which is less than that of any of the just 
previous categories into which the companies have been classified. 
Therefore it should not be allowed. The Principle of Maximum 
Classification disallows the entry of a category into another cate- 
gory with a lesser classification capacity. 

By the Principle of Maximum Classification, the analysis of 
Table 1 must discontinue with categories ABCD and EFGH. Each 
of these categories is considered to be a better classification for its 
respective members than is any lesser category into which the com- 
panies classified in the analysis. Thus, Category ABCD is con- 
sidered a better classification for each Company A and B than is the 
Category AB and likewise for Companies C and D with respect to 
Category CD. 

The resultant Categories ABCD and EFGH have agreement scores. 
of 16 and 13, respectively. Consequently, the members of the two 
categories have been classified thus far on only 16 and 13 items, 


522 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


respectively, out of the total of 32 items. Additional classification is 
possible. 


Tsolating the Secondary Patterns 


We now return to Table 1 to prepare for an additional classifica- 
tion of all the companies. For the additional classification, Table 1 
is revised to adjust it to what has already occurred. The agreement 
scores mediating between the pairs of Category ABCD are all re- 
duced by 16 and analogously those for Category EFGH are reduced 
by 18. The results are shown in Table 2. 

In the above analysis, if we had classified ABCD with EFGH, 
they would have agreed on six items, Should we then reduce by six 
all pairs realized by taking one company from each Category ABCD 
and EFGH? We did not, for this larger category was rejected by 
the method. However, to proceed in the contrary direction must be 
recognized as a possibility, and which approach is better can best be 
settled in the long run by comparative studies. 

Table 2 was obtained from Table 1 by revising Table 1 in the 
fashion just outlined. Table 2 was analyzed as follows, using the 
Self-Checking Version of Hierarchical Syndrome Analysis. 

Just as in the Replacement Version, the first step is to underline 
the highest entry in each column and then select the highest entry 
in the entire matrix. The highest entry of the matrix is 14, mediating 
between Companies AE. They are joined in a new Column AE to 
form the first category, and the entries for Column AE are deter- 
mined as in the Replacement Version, No entry is made for either 
Row A-Column AE nor Row E-Column AE. The entries of Column 
AE are entered also in Row AE. 

In the Replacement Version, both the Columns and the Rows for 
Companies A and E would be dropped at this point. In the Self- 
Checking Version only the Columns are dropped. As a result, A, for 
example, could classify into more than one pair. This would have 
occurred if A had had an agreement score of 15 with B; it would 
have first classified with B, and because it is the highest entry in 
Column E, it would have classified also with E. 

Ones have been placed above Columns A and E to show both that 
(a) Companies A and E entered the first category and (b) Columns 
A and E will not again be used in selecting the highest entry. The 
next step is to underline the highest entry in the new Column, AE. 


| 


ao 
OF | o o 
| mA 
- аш 
Qo 
" gge o oo 
X E Hi 
mo ооо 
EO 
RAF | om o 
o 
E 
| ЗАЯ ou o oo 
o 
Eo 
Be oo © © © 
225 oo мо o ооооо 
g 298 ооо ч oo 
8 36 созо ы oo о оо 
oo E 
à чод |е" Ho ono oo 
i qoM on 01 otoo o 4 oo 
n SBE lon оз оо 
à gom on ota oo ө o oo 
E ЕШ: Ona o« oo 
чо © 
Ч о 5 HOD co oo « 
К oe g Omo +o meee 
$ 9 F| moose һы oo o co 
i 3 А CHO © o4oow o + oo 
o oo 
ә о B on gove oo чоо 
E] 
8 um oo awo oo оооооо o ‹ 
E a =| goo owo oo oooooo0 o L 
E a о © 
on o 
хоо Hj ° 94393 eo «һа - 
genon шоу о gto o от € 
+ 
AnS" m|eeono owoow cort © 
‹ 
4" ш| 1559 to oOw-4-4oooow wo e 
~ lt et 
oor д|е-э= еддеоев og ooo © 
ооое о 
до olen @@®= со юоооо 
ARS" д o “soos HM OO аон mA = +з mmo 
юч < зое деце oo orocooco oo 
а а 2 |-«moamuwomnuumumogummtmmoÓ 
DEM EEEEEEFEEEE E 
{33 E 8^6 584 


524 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The highest entry is selected from the remaining columns, ex- 
cluding Columns A and E. It is 18 and mediates between A-B, E-B, 
AE-B, H-C and C-H. There is no conflict; B and AE join into Cate- 
gory ABE, and C and H join into Category CH. Since ABE is larger 
than CH, its components, B and AE, are assigned Step 2 in the 
analysis and those of CH, step 3; they are so indieated by twos and 
threes over the appropriate columns. 

The analysis is completed by proceeding in the above general 
fashion, as described in more detail elsewhere (McQuitty, 1960). 

The present analysis indicates the comprehensiveness of the 
method in case of Company A, which mediates in the highest score 
of three other Companies B, E, and G; it leads to Categories AB, 
AE, and AG. However, AB is not shown separately because B has 
its highest score with both A and AE. Consequently, only ABE is 
shown, rather than first AB and then ABE. 

After the analysis has been completed as shown in Table 2, the 
next step is to determine which categories satisfy the principle of 
maximal classification. This is done as shown in Table 3, by com- 
puting the number of responses classified by each category; it is 
equal to the agreement score of the category multiplied by the 
number of subjects classified in the category. In building Table 3, 
we started with the categories of pairs and built each sequence of 
categories from a pair. Thus in the first part of the table we started 
with AE and built it through to Category ABEFG, using the results 
of Table 2. 

The above procedure ended with some repeats; three different 
starting points, CH, DG, and GH, all ended with the same category, 
BCDFGH. This is not a defect. Rather, it is a necessary precaution; 
a category could be retained in one sequence and not in another; 
CH, for example, might have classified best at the top level and DG 
at an earlier level. We would have classified each at its best level. 

The categories of maximal agreement are underlined in Table 3; 
they are ABE, СОСН, CDEP, and ABGH. 

The next step is to determine whether or not to continue the 
analysis, that is to derive а new matrix from Table 2 in a fashion 
similar to the manner in which the matrix of Table 2 was obtained 
from Table 1. 

This new matrix cannot, however, be obtained in exactly the same 
fashion as the matrix of Table 2. An additional problem is created 


^ d 


p= 


"д 


Y 


j 


f 
4 
| 


LOUIS L. McQUITTY 525 


TABLE 3 1 
Number of Responses Classified by Second-Order Categories 


Agreement j No. of Companies No. of Responses 
Score Classified 


AE 14 
ABE 13 
ABEF 6 
ABEFG 4 


CH 13 
CDH 10 
CDGH 
CDFGH 
BCDFGH 


DF 

CDF 
CDEF 
CDEFG 
BCDEFG 


AG 

ABG 
ABGH 
ABFGH 
DG 

DGH 
CDGH 
CDFGH 
BCDFGH 
GH 

DGH 
CDGH 
CDFGH 
BCDFGH 


- 


м 


M - 


а 
= ORFF KER ORF OA mà OOD mit 


XXXXX XX XXX X XXX XX XXX XX XXX XX XX 
C» QW» cot AARON AAWO ORRON Cot ROO NO am wD 


»® бы BEES SRES „зг oO8BSS RRS 


because we used the Self-Checking Version (rather than the Re- 
placement, Version) in analyzing Table 2. Consequently, C and D, 
for example, entered into two maximal classification categories, 
CDGH and CDEF. In cases like this, a solution is to subtraet from 
the present CD of Table 2 (in preparing the new matrix) the larger 
agreement score of the two categories CDGH and CDEF. With this 
revision, the matrix of Table 4 was prepared from Table 2. 

The agreement score between C and D in Table 2 is 10, and the 
Agreement, scores of Categories CDGH and CDEF are 9 and 8, re- 
Spectively. Consequently, nine is subtracted from 10 to give the 


506 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 
Third-Order Agreement Scores between Companies 


A B ODE EF ан 
A org 3d 9 4 0 
BO ТО BS 
co 1 А1004 
Dg xr 2.42 2 
EI. 0.3 2 Bur a) 0 
FG. 0 0.4.0 6 4 
not 195.2 £956 2 
Ew 3 xz2g:4 2 


agreement score for CD in the matrix, Table 4. The rest of Table 4 
was completed in this same manner. Table 4 was not analyzed in 
this study because the method is the same as that for Table 2 and 
the main purpose of this study is to illustrate the method. In addi- 
tion, the entries in Table 4 are relatively small suggesting that the 
analysis might be discontinued at this stage. 


Results and Interpretation 


Table 5 shows the classification categories in relation to the in- 
dustries of the companies. Thus, the first major category, ABCD, is 


TABLE 5 
Classification Categories in Relation to the Industries of the Companies 


La са 
ы ы 
HE 
м ES 
2 ea f 
$8.4 2223 
S9€wuBELL S 
КИЕ 
87 P 
T TERME 
Categories -«monommotm < 
б ш nem NN NNNM 
Primary 
ABCD уууу 
EFGH 
Secondary ККУУ AS 
ABE vv м 13 
сран Уу vv 9 
CDEF АЛАЛ АУЛА. 
АВОН уу Vo 1 


à 


LOUIS L. MeQUITTY 827 


shown to classify together two construction and two trucking com- 
panies, and the second major category, EFGH, encompasses two 
garment companies, one grain processing, and one metal products 
company. The agreement score for the second category is less than 
for the first. 

The analysis was performed on indices of union-management re- 
lationships. The results seem to be consistent with the development 
of labor management relationships in these companies and indus- 
tries as reported elsewhere (McQuitty, 1954). 

There is also an obvious way in which the secondary categories 
are internally consistent; wherever an industry is represented by 
two companies, the two companies are invariably together, and no 
other pair is invariably together. 

Table 6 reports the patterns of responses for both the primary 
and secondary categories. Items which entered into classifying a 
company into one category were not in general involved in classify- 
ing it in another category. A and B, for example, were classified into 
three categories; only three items were common to two of their clas- 
sifications and none was common to all three. Hach GH and CD was 
also classified into three categories; CD has no item common to any 
two of its classifications and GH has only one common, and this one 
item is common to only two classifications. The classifications repre- 
sent quite independent configurations of responses. 

Table 6 shows also both the actual and estimated agreement scores 
for the categories. The estimated score exceeds the actual one in 
every case. The latter can never, in fact, exceed the former, for the 
latter is the upper limit possible; a category can never have more in 
common than the pair with the least in common, but it can of course 
have fewer in common as illustrated in the present results. 


A Critique of the Method 


The method is based in part on assumptions: (a) The Classifica- 
tion Assumption—a category has as much in common as the pair 
With the least in common, (b) The Principle of Maximum Classifi- 
cation—the significant categories for the isolation of patterns are 


_the ones in which the agreement score times the number of subjects 


in a category is maximal, and (c) The Computation of Residual 
Agreement Scores—residual agreement scores are obtained for any 
Pair, A and B, by subtracting from the original AB agreement score 


528 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 6 
Pattern of Responses for Primary and Secondary Categories 


Primary ^ Secondary 
Categories Categories 


H 


9 5 a 
Bie 9 8 
А < О 


CDEF 
ABGH 


1 Worker Attitude to Company am 
2 Worker Attitude to Union 
3 Management Attitude to Company 
4 Management Attitude to Union 
5 Extent of Union Influence 
6 Management Satisfaction with Union Influence 
7 Union Satisfaction with Union Influence 
8 Speed of Grievance Settlement 
9 Average Annual Earnings, 1948-49 
10 Average Hourly Earnings, 1948-50 
11 Sum of General Wage Increase, 1948-50 
12 Per cent increase in General Wage Changes, 
1948-50 T 
13 Stability of Management Leadership 
14 Stability of Union Leadership + 
15 Duration of Collective Bargaining + - 
16 Technological Development of Establishment — 
17 Publie Approval of Union 5 
18 Public Approval of Management + + 
19 Seasonal Stability of Production +s 
20 Skill Level of Work Force T 
21 Ease of Management. Entry into Industry + 
22 Ratio of Labor to Total Cost + 
23 Number of Hourly Employees in Establishment 
24 Degree of Price Competition in Industry 68 
25 Collective Bargaining Areas Related to 
Product Market + - 
26 Extent of Local Management, Autonomy ce 
27 Extent of Local Union Autonomy 
28 Degree of Management Resistance to 
Union Recognition 
29 Change in Business Volume, 1949/48 
30 Change in Business Volume, 1950/49 
31 Change in Business Volume, 1948-50/1946—47 
32 Change in Business Volume, 1948-50/1939-240 
Actual Agreement Scores 13 10 12 
Estimated Agreement Scores by the Method 
of Analysis 16 13 13 8 7 


MINUM tet DDR 
the highest agreement score of a significant category into which A 


and B classify. If A and B do not classify together into a significant 
category in the analysis of a matrix, then they retain for the residual 


++ 


+ + +! 
1 
++ 


Tt 
ew t t 


„4... 


тр 


LOUIS L. McQUITTY 529 


matrix the same agreement score which they had in the original 
matrix. 

The Classification Assumption need only be relatively correct in 
order for the method to work perfectly, and this relative condition 
need hold only for the agreement scores of categories that should 
result from the classification; they must always be found by the 
assumption to be higher than any of their competitors. In case of 
doubt they can always be checked in the course of an analysis. 

The Principle of Maximum Classification is only one of many 
which might be offered. This fact is emphasized by an awareness of 
the great number of patterns which could be isolated; a test of 20 
items, each with three answer alternatives, has 3% patterns of an- 
Swers, a number in the neighborhood of nine billion, All of these 
patterns cannot be examined; there must be a selection. Selection by 
predominant patterns and maximum classification is reasonable 
first approach. A limiting characteristic of this approach is that the 
higher order patterns are a function of the initial group of pre- 
dominant patterns isolated, If the latter patterns are inappropriate, 
the higher order patterns are going to be biased. Perhaps the best 
that can be expected from application of the method to the field of 
mental health assessment is that it will yield insights which will 
help improve the method for further study of the problem. 

The categories retained by the Principle of Maximum Classifica- 
tion can always be compared with those selected by other ap- 
Proaches, such as an index of statistical significance, or one of the 
purity of classification, of patients versus normals, for example. 

An alternative approach can be suggested for obtaining residual 
agreement scores, outlined in terms of subjects X. and Y, for exam- 
ple. If X occurs in any significant category, irrespective of whether 
0" not with Y, and if n of X’s responses are involved in defining the - 
Pattern of the significant category, than all n of these responses 
Which are common to X and Y must be subtracted from the agree- 
ment score between X and Y. The technique of the present study 
W88 used because it allows for n responses to belong to one pattern 
andn — r(r < n) of these same responses to belong to another pat- 
tern; it was assumed that two different selves within an individual 
Could lead to common responses, This point needs empirical investi- 
gation, 

By means of the above assumptions, an analysis which might 


530 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


otherwise be complex and laborious is &ecomplished by a method 
which is both brief and elementarily simple. The assumptions of the 
method can all be checked, and appropriately corrected in the course 
of analysis, without increasing to any considerable extent the com- 
plexity of the analysis. 

One important caution with respect to all methods of this kind is 
that the results are a function of the data as well as the method of 
analysis; the categories obtained depend on both the data and the 
method. More specifically, the data determine the agreement scores; 
if these scores do not derive from data which reflect relationships in 
mental health, for example, then the categories of the analysis will 
not classify people in relation to mental health. Success of the _ 
method assumes some subject-matter facility in selecting data. With 
some facility present in this respect, results from the method offer 
the possibility of helping to yield insights which will improve either | 
data selection, the statistical methodology, or both, and thus in the 
end produce greater substantive understanding. 

: The value of the method, in relation to our present substantive 
insights, will depend in the final analysis on whether the method 
yields both reliable and Meaningful results. Conclusions on these 
points must await applications in a variety of studies, The method 


might, for example, prove much more adequate for the classification 
of institutions than individuals, 


Summary 


This Paper outlines a theory of the manner in which mental illness 
expresses itself in behavior, and develops an objective method for 
analyzing behavior in such a fashion as to assist in investigating the 
theory and helping to revise it or the method or both, to the end that 
Some progress may eventually be achieved toward objective assess- 
ment of psychological well-being, 

Whether or not a response is indicative of mental illness or mental 
health depends on the combination of other responses with which ib 
occurs, For every time in whieh a response occurs in a configuration 
in which it indicates mental illness, it also occurs in one or more 
other configurations in which it indicates mental health. 

With the. above limitation of the dependability of single responses 
for indicating mental illness, we can still define as symptoms of 


LOUIS L. MceQUITTY 531 


mental illness all those responses which occur as parts of configura- 
tions whieh are associated uniquely with illness. 

Each patient uses only a few of the many symptoms of mental 
illness as à channel through which to express his disease; these com- 
pose a configuration which is unique to mentally ill patients. Every 
patient also reflects many more other symptoms, but these generally 
occur in larger combinations and are associated with healthy as well 
as ill persons. 

A method of analysis is developed which classifies persons (or 
institutions) first on the basis of their predominant patterns of re- 
sponses, then subtracts these out and reclassifies on secondary pat- 
terns of responses, repeating with higher and higher order patterns 
until nearly all responses have been used and the subjects have been 


- classified as many times as necessary to account for their responses. 


It is hypothesized that most every patient will eventually classify in 
а category which contains primarily only patients; the patient-pure 
categories will appear more frequently in the higher order steps of 
the analysis, 

This paper does not investigate the hypotheses; it merely de- 
velops a statistical method for testing them, and it illustrates the 
method by application to the classification of companies, using 
variables from the field of union-management relationships. Com- 


- Panies (institutions) were used instead of people for the illustration 


| 
l 
| 


because they appear to be more highly struetured, and the method 
Could thereby be illustrated with relatively few cases. 

After a matrix of interassociations between people (or institu- 
tions) has been prepared, the method requires no more complex 
mathematies than recognizing which of two numbers is larger and 
elementary subtractions. 


REFERENCES 


MeQuitty, L. L. “Pattern-Analysis: A Statistical Method for the 
Study of Types.” In W. E. Chalmers, M. K. Chandler, L. L. 
McQuitty, R. Stagner, D. E. Wray, and M. Derber (Editors) 
Labor Management Relations in Illini City, Volume II. Cham- 
Paign, Illinois: Institute of Labor and Industrial Relations, 


University of Illinois, 1954. 


Me uitty, L. L. “Hierarchical Syndrome Analysis.” EDUCATIONAL 


AND Psvcnoroarcan MEASUREMENT, XX (1960), 293-304. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


RESPONSE SET IN A MULTIPLE-CHOICE TEST! 


LEONARD WEVRICK ? 
University of Sydney 


. Tam possibility of the existence of response sets in the multiple- 
choice test has aroused recurrent interest during the past three or 
four decades, The bias which has received the greatest attention is 
that of positional preference. Cronbach (1950) suggests that the 
multiple-choice test is comparatively free of sets. There is some 
evidence (Rapaport & Berg, 1955) which tends to support this con- 
tention. There are, however, several studies which seem to refute this 
conclusion. McNamara and Weitzman (1945) report positional 
Preferences in the multiple-choice item. Studies using highly un- 
structured tests (Berg & Rapaport, 1954; Goodfellow, 1940; Nun- 
nally & Husek, 1958; Whitfield, 1950) demonstrate the existence 
of a nonrandom preference for certain positions or responses in 
objective-type tests. 
А The types of psychometric instruments used in the studies result- 
ing in positive findings were “questionless questionnaires," “imagi- 
. hary inventories,” and similar devices. Only the McNamara and 
[Nein data were obtained from a “real” objective examination, 

and the effects obtained, while statistically significant, were of a 

Telatively small magnitude, In general, it might be concluded that 

Cronbach’s assertion as to the negative association between the de- 
_ Sree of structure and the possible resulting bias is supported by the 
| existing evidence, 
E After reviewing the studies cited, and others, it was felt that con- 
: ditions might exist, or could be contrived, in which positional re- 
n Se 
b. T Paper was supported by a University of Sydney Research Studentship. 

Ow at Laurentian University. 
533 


534 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


sponse bias would be demonstrated in a highly structured multiple- 
choice test. Àn analysis of the effects of positional bias on item and 
test statistics led to the conclusion that it would be necessary, on 
some a priori basis, to design a test in such a manner as to produce 
a specified set which would be common to all or most subjects. A 
discussion of this technique follows in later sections. 


The Effect of Positional Bias on Item and Test Parameters 


If we assume that the sequence of correct and incorrect alterna- 
tives forms an Irregular Collective (von Mises, 1957), and if we | 
have a case in which some positional set exists (this set being the 
same for all subjects, varying only in intensity) the following hy- 
potheses are proposed: 


1. For any given item, if the correct alternative occupies the biased 
position, then the proportion of correct Tesponses given to that item 
will increase; and if a distrator occupies the biased position, the © 
proportion of correct responses made to that item will decrease. 

2. For any given test, if the position of the correct alternative is 
randomly distributed across items, a response set will not influence 
the obtained total score distribution. 


In order to test the first hypothesis it was necessary to violate 
one of the conditions of the second, random distribution of com- 
pletors. Two methods of reinforcing the selection of a position were 
used in the attempt to ereate the same Set for all subjects. These do 
not preclude the testing of the first hypothesis. The term reinforce- 


ment is here Used to refer to those conditions which increase the 
probability of a given response, 


Design of Study 
A pool of 100 five-a] 
the majority of the ite 
items were then arrang 
being defined as the pr 
being determined on 
constructed from thi 
entiating these tests 
These tests were: 


ternative voeabulary items was constructed, 
ms being adapted from existing tests. These 
ed in order of descending difficulty (difficulty 
oportion passing an item), the item difficulties 
the basis of previous data. Three tests were | 
S sequence of items. The only feature differ- 
was the position of the correct alternative. 


Test A. The position of the completor was randomly distributed 


LEONARD WEVRICK 535 


across the five alternatives in the conventional manner. This test 
provided the base levels of item difficulties. 

Test B. The position of the second alternative was chosen as that 
for which a set was to be established. Each item was then assigned 
a probability of having position 2 occupied by the completor, The 
probability assigned was a function of the serial position of the item 
in the established sequence. Thus, for item 1 the probability of posi- 
tion 2 being the correct alternative was 1.00, for item 2 the associ- 
ated probability of position 2 was 0.99, ete., until item 100 for 
which the probability of position 2 being correct was 0.01. Tables of 
random numbers were used to accomplish this task, After each item 
had been assigned a probability of having position 2 as the correct 
alternative, the tables were entered. If the two-digit number found 
exceeded the required probability, the completor was randomly 
placed in a position other than two. If the number found was equal 
to or less than the required probability, the correct alternative was 
placed in position two. Since the very easy items occurred at the 
| beginning of the test, this system ensured an initial knowledge of 
results which was necessary for the reinforcement of the set. 

Test C. The correct alternative was placed in position 2 for the 
first 25 items. The remaining 75 items were identical to those in 


test A. This test was designed as an alternative method of inducing 
а set. 


Administration and Scoring 


The three tests were administered as vocabulary tests to groups 
of university freshmen, the tests being randomly distributed in each 
. Broup. Each subject responded to only one test. The time allotted 
. Was 10 minutes and only seven subjects out of 256 failed to complete 
all items. The data from these seven subjects were discarded. The 
humbers of subjects used in the final analysis are shown in Table 1. 


TABLE 1 


Means, Standard Deviations, and 
Kuder-Richardson Formula 20 Reliabilities for Three Vocabulary Tests 


Test N Mean S.D. rfu 
A 84 76 10 .87 
B 82 76 12 .92 
C 83 77 11 .90 


536 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
y Analysis and Results 


~~ "The means, standard deviations, and Kuder-Richardson Formula 


20 reliabilities were calculated for each test. These data are pre- 
sented in Table 1. It can be seen that there were no differences in 
these statistics for the three tests. 

In order to test the hypothesis that a set would increase an item 
difficulty if the correct alternative were in the biased position, and 
decrease the difficulty if in any other position, an item by item com- 
parison of changes in difficulty was carried out. The item difficulties | 
obtained from test A were used as standards. The results of this | 
analysis for tests В and С are shown in Tables 2 and 3. Table 2 
clearly indicates the association between the item difficulty and a 
bias for position 2 in test B. The results for test C also indicate the 
presence of a set. 


TABLE 2 
Changes in Item Difficulty Due to Positional Bias, Test B 
Difficulty 
Increase Decrease 
Position 2 
correct, 31 9 40 
Position 2 
incorrect on sa п 
50 41 91 
х? = 14.7, 1 df 
р < 0.001 
TABLE 3 
Changes in Item Difficulty Due to Positional Bias, Test C 
Difficulty 
Increase Decrease 
^ 


incorrect 


Position 2 
correct, 26 8 34 
Position 2 
Position 20 
55 
d 


LEONARD WEVRICK * Mas 587 | 
During the analysis of the data, the possibility arose that 8 word 
preference, rather than positional preference, could account for some _ 
of the results. In order to check for this possible artifact, a compari- — 
son of the popularity (proportion of persons selecting) of the word 
in position 2, tests B and C, was made against the popularity of that 
same word when it occurred in test A. This was done for all items ~ 
and the increase or decrease in popularity noted. These results are * 
presented in Table 4. It ean be seen that the placing of a word in Veir 
position 2, for which a set was being established, іпсгеаѕве1 е popu- аў‘ 
larity of that word. This change was quite clear in test B but doubt- Я ^ 
ful in test C. P 


ГА 


н 
ТАВІЕ 4 
Comparison of Popularity of Word їп Position 2, Tests В and 0, 
with Popularity of Same Word in Test A 
^» 
Change in Number of 
Test Popularity items 
increase 75 t = 6.28 
B decrease 15 p < 0.001 
nil 10 
incrense 51 1-15 
С Чесгеаве 37 p < 0.07 
nil 12 
"Уо EOD rs 
Discussion 


These results indicate that it is possible to induce a response set 
lor a particular position in a multiple-choice test. A comparison of 
the two methods employed strongly indicates the superiority of the 
method of partial reinforcement (test B) as a set inducer. 

The failure of test C to yield clear results can be explained if one 
postulates a rapid extinction of the set after the 25th item. A sub- 
Sequent analysis of the data tended to show this extinction effect as 
Well as some spontaneous recovery of the set in the last few items 
of the test. Since the latter data did not yield conclusive results, 
they are not reported in detail. 

It might be noted that the method described could be used to 
generato other types of sets and thus allow them to be studied ex- 
Perimentally. The fact remains, however, that if a test is properly 


designed, in that the position of the correct alternative is random- 


JCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


iê over items, no positional response bias wil influence the ob- 
ained total score for a testee. 


Summary 


1. An analysis of the effect of positional response set on item and 
. test parameters shows that this form of bias does not attenuate test 
data under the condition of random placement of the item com- 
pletor. 

2. Under certain experimental conditions a positional response set 
can be clearly demonstrated in an objective test. 

3. An efficient method of inducing positional bias is the partial 
reinforeement of the desired alternative position by having the 
correct alternative occupy that position with a greater than chance 
frequency throughout the test. 


REFERENCES 


Berg, I. A. and Rapaport, G. M. “Response Bias in an Unstructured 
Supper Journal of Psychology, XXXVIII (1954), 475- 
Cronbach, L. J. “Further Evidence on Response Sets and Test De- 
sign.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 
(1950), 3-31. 
Goodfellow, L. D. “The Human Element in Probability." Journal 
. of Educational Psychology, XXVI (1945), 103-113. 

Mises, R. von. Probability, Statistics and Truth. London: George 
Allan and Unwin, 1957. 

Nunnally, J. and Husek, T. R. “The Phony Language Exam: An 
Approach to the Measurement of Response Bias." EDUCATIONAL 
AND Psycronocica, MEASUREMENT, XVIII (1958), 275-282. 

Rapoport, E: M a Berg, I. A. “Response Sets in a Multiple- 

oice Test.” EDUCATIONAL AND Psyc GICAL MEASUREMENT, 
XV (1955), 58-62, E 

Whitfield, J. W. “The Imaginary Questionnaire." Quarterly Journal 

of Experimental Psychology, II (1950), 76-87. 


e a س‎ 
ی ۸ کے‎ ННН ННН 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor, XXII, No. 3, 1962 


THE VALIDATION OF AN ABBREVIATED WECHSLER 
INTELLIGENCE SCALE FOR CHILDREN FOR USE 
WITH THE EDUCABLE MENTALLY RETARDED 


JACK M. THOMPSON ax» CARMEN J. FINLEY 
Sonoma County Schools 


IN a recent article Finley and Thompson (1958) proposed an 
abbreviated Wechsler Intelligence Scale for Children (WISC) for 
use with educable mentally retarded children. The abbreviated scale 
consisted of five subtests: information, picture arrangement, pic- 
ture completion, coding, and block design. The abbreviated scale 
gave a multiple correlation coefficient of .896 with full scale IQ. 
The standard error of estimate in predicting full scale IQ was 
4.307 scaled score units or three IQ points. 

The purpose of this article is to test the hypothesis that the 
Previously proposed abbreviated scale is a valid predictor of full 
seale IQ for other similar mentally retarded children. This is essen- 
tially a cross-validation. 

Although numerous studies have appeared in the literature on 
abbreviated intelligence scales, only Cotzin and Gallagher (1949) 
made a validation of their proposed abbreviated scale with a popu- 
lation similar to the original population. 


Subjects 


One hundred seventy-three WISC protocols of educable mentally 
Tetarded children comprised the data for this study. Protocols were 
selected on the basis of the same criteria as used for the standardiza- 
tion population. The criteria were: 


а. The subject must have been determined eligible for place- 
ment in a special education class for the mentally retarded as 
defined by the California Education Code, (1), Section 9801.1. 


540 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


b. Chronological age of the subject at the time of test adminis- 
tration was not less than 8-0, nor more than 13-6, the lower 
limit being chosen because Coding B is administered only to 
subjects of 8-0 years of age and beyond, whereas younger chil- 
dren are administered Coding A. The upper limit was arbitrarily 
chosen because of the setting in which the authors work: this is 
the upper age limit for children placed in special classes at the 
elementary school level. 
с. Full scale IQ of the subject was between 50 and 80. These 
limits were chosen because this is the usual range of intelligence 
for children to be considered as possible special class candidates 
in this county. In terms of the WISC classification (1949), this 
ineludes the borderline and mentally defective intelligence 
groups. 
d. The subject has been given all the ten usual subtests, that is, 
information, comprehension, arithmetic, similarities, vocabulary, 
picture completion, picture arrangement, block design, object as- 
sembly, and coding. 

Both boys and girls are included in these data. 


The mean full scale IQ of the validation population was 67.92 
with a standard deviation of 6.75, which is not significantly different 
from the standardization group. 


Method 


For each subject in the validation group, a predicted full scale IQ 
was computed using the table of weighted subtest scores developed 
on the standardization group. 

Three criteria were selected for investigation of the standardiza- 
tion and validation populations: 


а. Mean difference between predicted and actual full scale mean IQ 
Scores, 


b. Correlation between predicted and actual full scale IQ scores. 
c. Analysis of the amount of deviation between predicted and actual 
full scale IQ scores, 


Results and Discussion 


Table 1 shows the significance of mean differences and correlation 


THOMPSON AND FINLEY 541 


TABLE 1 


Significance of Mean Differences and Correlation Between 
Predicted and Full Scale IQ for Validation and Standardization Populations 


Validation Population Standardization Population 


Actual Predicted Actual Predicted 
1. Mean IQ 67.92 67.93 67.77 67.87 
2. Standard Deviation 6.75 6.06 7.06 6.42 
3.1 .037 .549 
4. Correlation .855 .892 


between predicted and full scale IQ scores for both the validation 
and standardization populations. 

As indicated in Table 1, there were no significant differences be- 
tween predicted and full scale IQ scores for either the validation or 
the standardization groups. 

The correlation between predicted and full scale IQ scores for the 
validation group was of nearly the same magnitude as for the 
standardization group. Although, as could be expected, the correla- 
lion was not as high for the validation population, it was high 
enough to indicate predicted scores of value. 

Table 2 gives the statistics of deviations between predicted and 
actual full scale IQ scores for the validation and standardization 
populations. 

As indicated in Table 2, the mean deviation for the validation 
population was only slightly higher than for the standardization 
Population. In the validation group, 80 per cent of the cases deviated 
by four or less IQ points while 85 per cent of the standardization 
group deviated by four or less IQ points. 

The results indicate that the validation population meets the 


TABLE 2 
Deviations Between Actual and Predicted Full Scale IQ 
for Validation and Standardization Populations 
————————————Є—————— 


Validation Standardization 
Е. __ cou eee 
Range of Deviations -8 to +9 —11 to +9 
ean of Deviations 2.79 2.47 
Standard Deviation 2.08 2.02 


umber 173 309 


542 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


criteria for establishing the validity of the previously proposed 
abbreviated WISC for use with educable mentally retarded children. 
There was a nonsignificant difference between predicted and actual 
full scale IQ scores. The correlation between predicted and full 
scale IQ scores was high and the amount of deviation was small. 
Thus, the hypothesis of the validity of the abbreviated scale was 
supported, 


Summary 


The validation of the previously proposed abbreviated WISC was 
attempted by the use of 173 WISC protocols selected on the basis of 
the same criteria as for the standardization population. Three meth- 
ods were selected to establish the validity: (a) difference between 
predicted and full scale IQ scores (b) correlation between predicted 
and full seale IQ scores, and (c) analysis of the amount of deviation 
between predicted and full scale IQ scores. 

The validation population met all three of the criteria and the 
abbreviated WISC was presented as a valid predictor of full scale 
IQ for use with educable mentally retarded children. 


REFERENCES 


Cotzin, M. and Gallagher, J. J. “Validity of Short Forms of the 

Other Wechsler Bellevue Scale for Mentally Defectives.” Jour- 
_ nal of Consulting Psychology, XIII (1949), 357-365. 

Finley, Carmen J. and Thompson, J. “An Abbreviated Wechsler 

a ау. veri for Men for Use With Educable Me 
ed. merican Journa iency, LX 

(1958), 473-480, of Mental Deficiency, 

State of California, Education Code. Sacramento: California State 
Documents Section, Printing Division, 1955. 

Wechsler, D. Manual for Wechsler Intelligence Scale for Children. 
New York: Psychological Corporation, 1949. 


| 
| 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


A STUDY OF THIRTY-FIVE PERSONALITY DIMENSIONS! 


ANDREW L. COMREY 
The University of California at Los Angeles 


Since its inception, the main focus in personality measurement 
has been upon broad, factorially complex measures. In many in- 
stances, particularly with projective methods, quantitative assess- 
ment has been subordinated to descriptive interpretations based 
upon clinical intuition. Early attempts at objective personality 
assessment, e.g., the Minnesota Multiphasie Personality Inventory, 
produced scales which proved to be measuring several relatively 
independent variables simultaneously rather than single dimensions 
well described by scale names (Comrey, 1957, 1959). Besides the 
fact that several relatively independent entities were being included 
under one rubric, there has been the disturbing fact that many 
scales presumably measuring different things and having different 
names show a high degree of overlap. A correlation of .86, for 
example, has been reported between the Sc and Pt scales of the 
MMPI (Wheeler, et al., 1951). This much overlap between scales 
within a given system is inefficient unless the scales are combined 
to give one total score. Furthermore, it creates some confusion as to 
what is being measured when the scale names suggest that different 
traits are being assessed. 

The MMPI is not being singled out here as a unique example, 
but rather because it is typical of what has happened with many 


Personality inventories. Scale after scale has been developed which 
ы 

1 The computations for this study were carried out on the IBM 709, operated 
by the Western Data Processing Center, UCLA. Support for the research came 
from a grant by the University of California. The author is indebted to Mrs. 
Carol Afflerbach Vale, who wrote many of the items used and helped with the 
data collection, and to Mrs. Beth Schlesinger, who helped with the data analy- 


543 


54 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


not only does not measure a single trait, but correlates to a very 
substantial degree with other scales having very different names, 
This phenomenon has been especially noticeable where criterion 
groups have been used to select items. For example, selection of 
items which discriminate between a “depressed” and a “not de- 
pressed” group for the purpose of measuring “depression” will 
generally result in a factorially complex scale. 

Recognizing the difficulties of the criterion group approach, factor 
analytically oriented personality research workers have sought to 
define personality variables by analysis of the interrelationships of 
the items themselves, rather than by selecting items which discrimi- 
nate between two groups of people. Collections of items have been 
factor analyzed to find out which items bear a statistical relation- 
ship to each other, Although these methods have resulted in a con- 
siderable improvement in the homogeneity of those things classified 
under the same trait name, while at the same time reducing the 
degree of overlap between traits with different names, further im- 
provement is still possible, 

1 One area of possible improvement lies in the method of develop- 
Ing items for measuring personality traits. Many investigators have 
used collections of items developed without any particular attention 
to the relation of these items to possible factors to be measured, or 
indeed, to each other. Thus, an unsystematic collection of items is 
assembled, factor analyzed, and the results 
emerging, hitherto unsuspected, factors of personality. While such à 
procedure may result in the identification of an important person- 
ality factor, it has certain disadvantages. For example, only by 
accident would one obtain enough good items to measure a given 
factor adequately. Secondly, a factor derived from an unsystematio 
collection of items often fails to emerge when another investigator 
repeats the experiment with different subjects and a different 


accompanying collection of items, This is particularly apt to happen 
when the factor loadings were not very high originally. Finally, this _ 
unsystematic approach hag 


1 PUE the disadvantage of requiring a fortu- 
itous compilation of items if any particular personality factor is to 
emerge. It is reasonable to Suspect, therefore, that in using it many 
important factors would be missed, or at least not strongly identified. 

In an attempt to mitigate Some of these difficulties, the author has 
suggested that highly homogeneous items should be written specifi- Р 


eagerly inspected for the 


ANDREW L. COMREY 545 


cally to measure a partieular defined dimension of personality 
(Comrey, 1961). Such a dimension may be suggested by intuitive 
processes, theory, or previous research. After its verbal definition, 
the investigator seeks to develop a collection of items which should 
measure this specific dimension. Subsequent factor analyses of these 
items, in conjunction with items designed to measure other dimen- 
sions, will reveal the extent to which the investigator has indeed 
been successful in defining and measuring a single trait, or trait 
complex. If all the items developed, for example, have high faetor 
loadings on a single emerging factor, success has been achieved. He 
has developed what will be called a “factored homogeneous item 
dimension." 

Thus, by deliberate design, the investigator hypothesizes and 
attempts to verify the existence of a factor in a given area. Various 
regions of human personality can be systematieally and exhaustively 
studied with this technique. For each important factor, development 
of several items with substantial factor loadings is usually possible 
through refinement and reanalysis, Furthermore, confirmation of 
findings by other investigators is more likely than it would be with 
4 less systematic approach since a factor has been hypothesized and 
verified rather than merely observed as an unanticipated outcome. 
Three previous papers have dealt with earlier applications of this 
factored homogeneous item dimension approach to personality 
measurement (Comrey, 1961; Comrey & Soufi, 1960, 1961). The 
present paper reports the results of a study in which refinements 
have been introduced and additional dimensions have been added 
for investigation. 


Procedure 


Thirty-five personality variables were defined for measurement 
in this study. Some of these dimensions are refinements of those 
studied previously (Comrey & Soufi, 1960, 1961), and some were 
added for the first time in the present investigation. Many of the 
Variables have been suggested or studied in one form or another by 
Some other investigator, e.g., Guilford, at some previous time (Cat- 
tell, 1957; Guilford, 1959). One of these dimensions was designed as 
a “truth” scale to indicate the extent to which the respondent is 
Willing to tell things about himself which are not particularly flat- 
tering. This scale, consisting of eight items, was not intended to be 


546 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


highly internally consistent, although it did exhibit a fair degree of 
homogeneity. The other 34 dimensions were designed to be homoge- 
neous and to measure specific personality traits, six items being used 
to constitute each scale. Four items formed a validity check to 
detect if the respondent had marked the answer sheet in an irra- 
tional manner. 

The 216 items were arranged in a booklet in such a way that the 
items measuring a given dimension were well separated. The sub- | 
jects answered each question using one of the following three scales: 
(X) A. Always B. Usually C. Occasionally D. Rarely E. Never 
(Y) A. Very frequently B. Frequently C. Occasionally D. Rarely 
E. Never 


(Z) A. Definitely B. Probably C. Possibly D. Probably not E. Defi- 
nitely not. 
These three answer scales were printed at the top of each question- 
naire booklet page. A letter X, Y, or Z followed each item number 
in the booklet to indicate to the subject which set of response al- 
ternatives he should use, 

The Questionnaire. The thirty-five dimensions will be listed 
below. Each dimension name is followed in parentheses by the mean 


score on the group tested, the standard deviation, and the reliability 
coefficient for the dimension, 


appears with each dimension, 
product-moment coefficient, of 
dimension was computed, thu 
item test. This coefficient w. 


respectively. A sample item also 
To estimate reliability, the average 
correlation between items within the 
s estimating the reliability of a one- 
as corrected by the Spearman-Brown 
formula to a test of the appropriate number of items, usually six. 
This method tends to give a conservative estimate of the proportion 
of true variance in the dimension total scores. In scoring items, an 
of “5,” a “В” response “4,” and so on, 
down to “1” for an “p” Tesponse, Dimension scores were obtained 
by adding the item scores for items scored on that, dimension. The 
dimensions will be presented in four groups, since analyses to be 
described were based on this breakdown. 


“А” response received 8 score 


Group I dimensions: 


2. En E iie (167, 39, 79); 2 (X).I depend on people to help me with my 


8. vi cene 177); 8(Y). I feel that if I don't get up and do some 


е 


M. 
15. 
17. 


24. 
25. 


27. 
28. 
35. 


ANDREW L. COMREY 547 


Irresponsibility (13.7, 34, 67); 9(Y). I wish I could get away from all my 
ties and responsibilities. 

Depression (12.0, 3.8, 84) ; 14 (X). I feel blue and depressed. 

Sensitivity (17.5, 4.5, 78) ; 15(Y). I feel upset even by slight criticism. 
Shyness (17.1, 44, 81); 97 (X). In a social gathering, І would rather watch 
than be a performer. , k 
Paranoia (126, 3.0, 67); 104(Y). I feel that people criticize me unfairly. 
Restraint (22.1, 32, 65) ; 105(Z). I would go to great lengths to avoid mak- 
ing a scene in public. 

Inferiority Feelings (13.2, 33, 71); 107(X). When meeting people, I am 
afraid that I will not be able to keep up a conversation. 

Confidence (232, 31, 63); 108(Z). I feel I can do whatever I really put 
my mind to. 4 

Inattentiveness (153, 3.0, 63); 115(Y). I misplace things and have a 
hard time finding them. 


Group II dimensions: 


4. 


11. 
12. 


16. 
19. 
20. 
32. 
33. 


Benevolence (248, 32, 71) ; 4(X). Even if I were in a hurry, I would take 
time to help a blind man across a street. i д 
Friendliness (23.0, 85, 60) ; 11(X). I try to enlarge my circle of friends. 
Rhathymia (133,42, 75); 12(Z). I think the most important thing in life 
is to enjoy oneself. j 
Cynicism (15.1, 4.1, 82); 16(Z). Most public officials would accept bribes 
if they were large enough. 7 
Hostility I (10.9, 28, 68); 99(X). Even when someone does a good job, I 
find something to eriticize. 1 у 
Psychopathic Personality (87, 3.1, 78); 106(Z). If a person is stupid 
enough to be cheated, he deserves to be cheated. И , d 
Impulsiveness (163, 3.0, .55) ; 112(Y). I rush headlong into things without 
knowing quite what I'm getting into. à 4 
Sex Drive (15.5, 4.3, 76) ; 113(Z). I hope that I will always be interested in 
вех. 


. Hostility П. (9.3, 32, 80); 114(Z). Most people are а burden to society. 


Group III dimensions: 


7. 
18, 


20. 
21. 


23. 
29. 


Welfare of Loved Ones (223, 53, 87); 7(7). I try to give my family every 
possible advantage. n 
Ascendance (21.7, 29, 47); 98(X). If someone criticizes me unfairly, I call 
it to his attention. А uh 
Drive to Finish (208, 2.8, 58) ; 100(X). When I start a job, I finish it. 
Need for Security (23.0, 34, 67); 101(2), I would rather have а modest 
income I can count on than a large but unsteady one. р 
Need for Conformity (183, 34, 67); 103(X). I feel better doing what 
everyone else is doing. В 
Need for Order (199, 31, 52); 109 (2). Tt upsets me to be in а шешу 
Ouse, 


31. Love of Food and Drink (174, 33, 52); 111(Z). I enjoy eating more than 
most other people. I» 
36. Need for Social Approval (19.1, 4.0, 70) ; 116(Z). It is important for me to 
be accepted in my community. 
Group IV dimensions: 
1. Social Self-Sufficiency (21.6, 39, 70); 1(X). In my spare time I can find 


things to do that will occupy me. 


548 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


3. Need to Excel (210, 44, 16); 3(Z). It has always been very important for 
to get ahead in life, А 

5. Thoughtfulness (206, 42, 78); БСҮ). 1 think about philosophical issues, 

8. Taciturnity (212, 3.0, 63) ; 6(Z). It is unlike me to make statements which 
I later regret. t 

13. Need for Pedo (201, 52, 67); 13(X). I like to be free of any demands 
on my time. 3 

22. Aggression (109, 3.5, 74) ; 102(Y). When I get mad, I break things. 

30. Truth Scale (28.8, 63, 87); 10 (Z). I have gossiped about people I know. 


The Sample. An attempt was made to obtain a broad, heterogene- 
ous collection of individuals for the analysis in order to insure 
adequate variance on all personality dimensions, Subjects were 
drawn from two principal sources, First, fifty residential blocks 
from Santa Monica, California, were selected at random. Test 
booklets and instructions were delivered to every third dwelling 
unit. Students in UCLA psychology classes, their friends, and 
family, constituted the other principal source of subjects, All sub- 
jects were promised an analysis of their results for completing the 
questionnaire. Altogether, 436 valid records were collected and 
analyzed. 

Some relevant data on the sample are: Student males, 74, student 
females, 114, non-student males, 94, non-student females, 154; Age 
—less than 20, 100; 20-29, 167; 30-39, 52; 40-49, 49; 50.59, 40; 60 
and over, 24; no response, 4; Highest school grade completed—less 
than 12, 23; twelfth grade, 90; one to three years college, 206; 
college graduate, 63; some graduate work, 27; no response, 27; 
Marital status—single, 237; married, 155; divorced, 18; separated, 
5; widow (er), 16 ; NO response, 5; Religion—Protestant, 225; Catho- 
lie, 54; Jewish, 91; none, 43; other, 6; no Tesponse, 43; Political 
preference—Republican, 137; Democrat, 168; Independent, 84; 
other, 4: and no response, 43, 

Intercorrelations among the 35 dimensions wen 
rately for student, females, student males, 
non-student males. Differences between groups were not regarded 
as sufficient to warrant Separate analysis, А1] groups were combined, 
therefore, for the analyses to be reported. 

Factor Analyses of Items, Dimensions were divided into the four 
groups described above for the Purpose of carrying out factor 
analyses of items, Thus, the 66 items for the ll dimensions in 


» using the Pearson product-moment f, 
minimum residua] method (Comrey, in 


e computed sepa- 
non-student females, and 


ANDREW L. COMREY 549 


press). This method gives results very similar to those obtained 
with the centroid and principal components methods. The factors 
were rotated by the normal varimax method (Kaiser, 1958). The 
entire procedure was carried out separately for the dimensions in 
Groups I, II, III, and IV. Separation of the dimensions into four 
groups was necessary because a factor analysis of the entire item 
matrix would have been impractical. 

The number of factors extracted and rotated in these four 
analyses were 14, 13, 12, and 12, respectively. As hypothesized, 
every one of the 35 dimensions emerged as a faetor with at least 
one loading of .5 or more in one of the four analyses. Only one factor 
not hypothesized emerged from any of these analyses with a loading 
of .5 or more, although Shyness did unexpectedly split into two 
factors. All the remaining factors were insignificant residual factors. 

Twelve of the dimensions studied emerged as factors with four or 
more loadings which approached or exceeded .5; namely, 1. Social 
Self-Sufficiency, 2. Succorance, 3. Need to Excel, 5. Thoughtfulness, 
7. Welfare of Loved Ones, 14. Depression, 15. Sensitivity, 16. Cyni- 
cism, 19. Hostility I, 26. Psychopathic Personality, 33. Sex Drive, 
and 34. Hostility II. The six items used for each of these dimensions 
have been deposited with the American Documentation Institute.? 
For each item, the mean, standard deviation, and factor loading are 
given. The remaining dimensions cannot be considered sufficiently 
refined to warrant presentation in complete form at the present 
time, 

Factor Analysis of Total Dimension Scores. Intercorrelations 
among the 35 total dimension scores computed over the 436 cases, 
using Pearson r, are given in Table 1.2 This 35 X 35 matrix was 
factor analyzed by the same methods used for the analyses of items, 
Ten factors were extracted and rotated. Space prevents the presen- 
tation of all except the most important of these factors, ie. the 
following: 

Factor I. Neuroticism. As in the previous analysis, the first factor 
with the greatest variance seems to merit the name “Neuroticism.” 


*Items for the 12 dimensions given above and the table of intercorrelations 
among the total dimension scores (Table 1) have been deposited with the 

merican Documentation Institute. Order Document No. 7162 from the 
Chief, ADI Auxiliary Publications Project, Photoduplication Service, Library 
of Congress, Washington 25, D. C., remitting in advance $125 for 35 mm. 
microfilm or $1 25 for photoprints. 


550 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Loadings in the range .3 to .4 occurred for dimensions 2, 25, and 34 
Loadings of from .40 to 49 occurred for dimensions 5, 15, 24, and 35,” 
The remaining loadings were: 8. Agitation, .71; 9. Irresponsibility, 
50; 14. Depression, .67; 17. Shyness, .57; 27. Inferiority Feelings, 
.74; and 28. Confidence, —.57. 

Factor III. Dependence. The major loadings on this factor were 
for dimensions indicating the extent to which the individual admits 
being dependent on other people for aid and approval. Loadings 
were: 1. Social Self-Sufficiency, —.41; 2, Succorance, .54; 15. Sensi- 
tivity, .37; 23. Need for Conformity, .66; and 36. Need for Social 
Approval, .74. 

Factor IV. Compulsiveness. The two dimensions with highest 
loadings on this factor seem to justify the application of this time- 
honored label. There is no particular indication at this point, how- 
ever that any pathological component dominates this factor. It 
seems to emphasize adherence to certain traditional values in our 
culture, Loadings were: 3. Need to Excel, .40; 7. Welfare of Loved 
Ones, .88; 12. Rhathymia, —.33; 20. Drive to Finish, .63; 21. Need 
for Security, .39; and 29. Need for Order, .59. 

Factor V. Friendliness. Neither of the two dimensions with major 
loadings on this factor was one of the better dimensions with respect 
to other psychometrie criteria considered. With further refinement, 
these two dimensions may serve to define a relatively important 
aspect of personality. The dimensions and their loadings for this 
factor were: 4, Benovolence, .59; 11. Friendliness, .67; 17. Shyness, 
—.41; and 32. Impulsiveness, 37. 

Factor IX. Hostility. A factor of Hostility also appeared in the 
previous analysis of total score dimensions with loadings of .61, .50, 
and —49, respectively, for Hostility, Cynicism, and Friendliness. 
The Hostility dimension in that analysis was roughly the same a8 
Hostility II in this analysis, The present Hostility factor is some- 
what better defined than the last one, offering considerable promise 
for a useful clinical scale. The loadings of .3 ог more were: 4 
Benevolence, —.32; 19, Rhathymia, .48; 16. Cynicism, .57; 19. 


Hostility I., 44; 24. Paranoia 41; 26. P 1i 
Жалда НИШ TT A. рм Personam 


Discussion 
The present investigation has resulted in the development of 


ANDREW L. COMREY 551 


several factored homogeneous item dimensions of personality, each 
of which emerged from a factor analysis of items and each of which 
had several items with loadings of about .5 or more. These items 
with high factor loadings will serve where necessary as nuclei for 
developing longer seales with greater reliability. Although most of 
these dimensions had substantial loadings on common factors 
derived from intercorrelations of total dimension scores, each of 
them contributes a substantial amount of true variance not shared 
by some other dimension included in the present analysis. Develop- 
ment of these and other dimensions to a high level of refinement 
should provide useful tools for research work in the area of per- 
sonality measurement. 

Many psychologists have been quick to condemn the personality 
inventory as useless, on the basis of unsatisfactory results which 
have been obtained in the past with certain instruments. It does 
not seem reasonable, however, to assume that nothing can be done 
with this form of measurement because past results have not been 
satisfactory. In this author's opinion, every ‘effort should be made 
to perfect this form of measurement to the extent that it is possible 
to do so. The ease of application of the inventory in comparison 
with other personality measurement devices renders it the most 
convenient method available, and hence the method of choice where 
it can be successfully applied. 


REFERENCES 


Cattell, R. B. Personality and Motivation Structure and Measure- 
ment. New York: World Book Company, 1957. 

Comrey, A. L. “A Factor Analysis of Items on the MMPI Hypo- 
chondriasis Scale." EDUCATIONAL AND PSYCHOLOGICAL MEASURE- 
MENT, XVII (1957), 568-577. ў j А 

Comrey, A. L. “Comparison of Two Analytic Rotation Procedures. 
Psychological Reports, V (1959), 201-209. psa 

omrey, A. L. “Factored Homogeneous Item Dimensions in Person- 
ality Research.” EDUCATIONAL AND PSYCHOLOGICAL MEASURE- 
MENT, XXI (1961), 417-431. 

Comrey, А. L. “The Minimum Residual Method of Factor Analy- 
sis.” Psychological Reports, in press. 

Comrey, P L. id Soufi, % “Further Investigation of Some Factors 
Found in MMPI Items.” EDUCATIONAL AND PSYCHOLOGICAL 
MEASUREMENT, XX (1960), 779-786. у 3 

Comrey, A. L. and Soufi, A. “Attempted Verification of Certain 
Personality Factors.” EDUCATIONAL AND PSYCHOLOGICAL MEAS- 
UREMENT, ХХІ (1961), 113-127. 


552 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Guilford, J. P. Personality. New York: McGraw-Hill, 1959. 
Kaiser, H. F. “The Varimax Criterion for Analytic Rotation in 
Factor Analysis.” Psychometrika, XXIII (1958), 187-200. 
Levonian, E., Comrey, A., Levy, W. and Procter, D. “A Statistical 
Evaluation of the Edwards Personal Preference Schedule.” 
Journal of Applied Psychology, XX XXIII (1959), 355-359. 
Wheeler, W. M., Little, K. B., and Lehner, G. F. J. “The Internal 


Structure of the MMPI.” Journal of Consulting Psychology, XV 
(1951), 134-141. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


EFFECT OF ANXIETY ON VERBAL AND MATHEMATICAL 
EXAMINATION SCORES! 


JOHN W. FRENCH 
Educational Testing Service 


A bitter complaint of some students who take college entrance 
tests is that the tests are taken under such conditions of pressure 
that it is impossible for them to show their best work. The primary 
aim of this research was to learn about the emotional states aroused 
in students at the time of college entrance testing and to obtain 
estimates of the effect of these states on test scores and on the 
validity of the test. It is important also to consider the question of 

' how to heighten desirable emotional states and how to avoid unde- 
sirable ones, Although no attempt was made in this experiment to 
alter the subjects’ reactions, it was felt that a first step would be 
taken toward doing this by asking students what they become 
anxious about and by obtaining information on what groups of 
Students are most strongly affected by the pressures put on them 
by an examination. It was thought, for example, that in some 
People fear of failure may facilitate preparation for a test, may key 
а person up to unusual effectiveness during the test, or may inspire 
Sreat effort to obtain high grades in college. On the other hand, for 
other people fear of failure might have an inhibiting effect on any 
or all of these, 

Although there is a considerable literature on anxiety and its 
correlates, room is still left for speculation on the effect of anxiety 
Оп test scores, because a direct comparison of the same students 


* This project was supported by the College Entrance Examination Board. 
Acknowledgment is also made for important help received from Frederic M. 
Lord, who suggested the “simultaneous equation” analysis and answered а 
number of theoretical and statistical questions. 


553 


554 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


working on parallel scholastic aptitude tests under relaxed condi- 
tions and under conditions of natural anxiety has rarely been made, 
An approach to these conditions was taken by Lund (1953), who 
administered an untimed test with items in conventional order of 
increasing difficulty and another untimed test with some difficult 
items placed first. The subjects did more poorly on the second test 
presumably because of anxiety arising as the subjects encountered 
difficult items at the beginning. It is well known that placement of 
difficult items at the start of a test will slow subjects down, but, 
since this test was not timed, it seems reasonable to conclude that 
anxiety had a disruptive effect. In another experiment which af- 
forded a true comparison of relaxed and anxious testing conditions, 


Lazarus and Eriksen (1952) produced stress in college students | 


working on the Wechsler Digit-Symbol test by telling them they 
had done poorly. They found that stress improved scores for stu- 
dents with high academic standing, while it decreased scores for 
those with low standing, 

Much attention has been given to the correlation between scores 
on an anxiety scale and level of intelligence. In most cases a low 
negative correlation has been found (Sarason & Mandler, 1952; 
Sarason, 1957; Spielberger, 1959; Matarazzo, et al., 1954; Hastings, 
1944; Sarnoff, et al., 1959; Kerrick, 1955; Purcell, et al, 1952; 
Calvin, et al., 1955; Zweibelson, 1956; Dreger & Aiken, 1957). The 
first two listed above were using the College Board Scholastic Apti- 
tude Test (SAT). Sarason and Mandler (1952) also note a low 
positive correlation between anxiety and the Henmon-Nelson intelli- 
gence test, and Sarason (1957) a low positive correlation between 
general anxiety and grade-point-average. Some correlations around 
zero were also found by a few of the workers mentioned above and 
& number of others (Spielberger, 1959; Dreger & Aiken, 1957; 
Sarason, 1956; LaMonaca & Berkun, 1959; Schulz & Calvin, 1955; 


Klugh & Bendig, 1955; Matarazzo 
; t aL, 1954). Alpert and Haber 
(1960) found “general” ^i Ee: 


than specific “test” 


for debilitating a 
other. Even thou 


JOHN W. FRENCH 555 


of anxiety, its relation to intelligence, this does not answer the 
question at issue in the present experiment. Evidence that anxiety 
is usually found to accompany low test scores proves nothing about 
the part that anxiety plays in bringing about the low scores. 


Data-Gathering Procedures 


a. Data for estimating the effect of examination pressure on test 
scores. Arrangements were made to have students at sixteen high 
schools take, with their January 1960 College Board Scholastie 
Aptitude Test (SAT), one of four specified half-hour test forms 
having material parallel to that in the regular operational parts of 
the SAT. (Since the experimental tests were printed in the same 
booklets with the regular tests and there was no indication of which 
questions were experimental, it is likely that the students assumed 
they were part of the regular test.) These experimental, nonopera- 
tional sections of the SAT will be referred to as Verbal-A, Verbal-B, 
Mathematical-A, and Mathematical-B. A few days before the SAT 
for some students and a few days after it for others, the students 
who took Verbal-A with the SAT were given a special administra- 
tion at their school of Verbal-B, and vice versa, and students who 
took Mathematical-A with the SAT were given a special admin- 
istration of Mathematical-B, and vice versa. The table below sets 
forth information about the participating groups and the design of 
the testing program. Students at each pair of schools underwent one 
of eight testing sequences, as shown in Table 1. 


TABLE 1 
Participating Groups and Design of Testing Program 


School number 4|15|14| 16] 6 | 7| 3] 9] 1 
Location (State) |NY| DC| RI| NJ| Mass| NJ| Mass|NY Mass NY) Pa 
Number of boys |65|72|81|144| 59 |64| 132 | 65 | 82 |53 |94 


Number of girls | 62 | 43 |51| 58 
| —— UD 


\ 
А few days Verb. | Verb. 
before SA'T A B 


‘With SAT Verb, | Verb. 


A few days B A 


after SAT 


556 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


It was assumed that the students believed that the experimental 
tests given with the SAT contributed to the Scores sent to the col- 
leges of their choice. Thus pressure or "anxious" conditions may be 
said to have prevailed. At the special administrations a few days 
before or a few days after the SAT, on the other hand, “relaxed” 
conditions were set up. In order to create a relaxed attitude and yet 
to maintain motivation on the part of the students, the following 
announcement was made just prior to the test: "This is a test con- 
taining questions like some of the ones in the SAT. It will be scored 
at Educational Testing Service for research purposes. The scores 
will not be reported to any college, but will be reported to your 
school. Please do your best.” 

The sixteen schools were selected from among public high schools 
at which about 150 students were expected to take the SAT in 
January 1960. They were paired for an expected total close to 300 
and representation from two geographical regions. The pairs were 
then assigned to the eight testing sequences at random. A few of the 
originally selected schools were not able to participate in the experi- 
ment and were replaced by other schools in the same state having a 
similar number of students expected to take the SAT. 

b. Data about the students’ anxiety reactions and relevant back- 
ground factors. It was considered likely that the test scores would 


show that some students suffer ill effects on a pressure test while 
others do not. It cannot be assumed 


Personality Research Inventory (Saunders, 1955), and 17 of the 20 
items from an anxiety scale by Christie (Christie & Budnitzky, 


| 
| 


ee 


е - 


JOHN W. FRENCH 557 


were omitted from the original Christie Seale to avoid publie rela- 
tions problems with the schools. 

The first part of the questionnaire asked about the student's 
family, his educational and vocational plans, whether he intended 
to apply for a scholarship, and what sorts of characteristics of a 
college would affect his choice. Responses to these items coupled 
with findings with regard to test scores are used in learning some- 
thing about the kind of person affected by anxiety and the kind of 
pressures that produce anxiety. Many more facts about the students 
and their lives would be pertinent to an investigation such as this, 
but it was considered desirable to hold the questionnaire to a length 
that would permit completion within one class period. 

€. School achievement data. The participating schools supplied 
the students’ course grades in mathematics, science, social studies, 
and English. Averages for these subjects over the eleventh grade and 
the first half of the twelfth grade were used to compare the coneur- 
rent validities of the relaxed test, the anxious test, and the SAT. The 
English grades also served in a trial analysis of the relationship 
between the effect of anxiety on test scores and over-achievement. 


Analysis of Test Validity 


Analysis of the intercorrelation tables computed separately for 
each school asked whether anxiety affected the validity of the tests. 
Validity in this analysis is concurrent rather than predictive; it is 
the correlation of the tests with school grades. For the anxious and 
relaxed tests, medians were found across the eight sehools where 
these tests were verbal and the eight schools where they were mathe- 
matical. For purposes of comparison, the median for SAT-V was 
taken across the eight schools where the experimental tests were 
verbal, and the median for SAT-M was taken across the eight 
schools where the experimental tests were mathematical. The results 
are given in Table 2. 

When the corresponding “anxious” and “relaxed” tests are 
compared, the hypothesis that anxiety reduces the validity of the 
test is not borne out. 

It can be hypothesized that the SAT-V is taken under conditions 
of greater anxiety than any of the other sections, since it appears at 
the beginning of the SAT, when the candidates are most nervous. 
Table 2 shows that the validity of SAT-V is not lower than the cor- 


558 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 


Comparison of Validities for SAT, the Anzious Test, and the Relaxed Test 
ت‎ 


Males 
Verbal Verbal Math. Math. 
Course SAT-V Anxious Relaxed SAT-M Anxious Relaxed 
Mathematics .484 .421 .479 .563 .603 .564 
Science К .434 .482 .500 .615 .607 .569 
Social Studies .547 .499 .505 .893 .453 .396 
English .554 .550 .488 .491 .484 .462 
У Females 
Mathematics .521 .469 .477 .654 .556 .524 
Science .509 .469 4719 .455 -396 .400 
Social Studies .607 .590 .544 ‚415 ‚378 .399 
English .656 .581 .536 ‚489 .438 401 
Both Se 

Media sith 0 xes and All Courses 

column . 534 .490 4 : .431 
Co СЫ {р 494 490 .468 

length to that 

of the SAT 584 1522 .527 .490 .499 .459 


rected values of the validity for the experimental anxious and re- 
laxed verbal tests. In mathematics, the relaxed test has slightly 
lower validity, if anything, than the more anxious tests. 


The Simultaneous Equations Analysis 


Central to the purpose of this study is an estimate of the actual 
oe in score Points of the effect of anxiety on the test scores of 
ifferent categories of candidates, However, within the framework 
оше conducted in conjunction with the regular SAT, there was 
2 ым to avoid a confounding of fatigue and practice with anxiety. 
a d wiles «95 а balanced design, using alternate test forms for 
e anxious" and “relaxed” testa, the operational part of the SAT 


Edo: ty ISDN ent. Since the experimental section in the 
SAT A Mid NE the last half hour of the three-hour 

› 1 was presumably subject to some conditions differing from 
those prevailing during the earlier scorable sections of the examina- 
tion. These would be more fatigue, more practice, and, possibly, 8 


JOHN W. FRENCH 559 


different amount of anxiety. Balancing out these effects is not sim- 
ple. For example, even though the experimental design provided for 
а relaxed test after the SAT for some schools and before the SAT 
for other schools, a simple averaging of those scores does not neces- 
sarily bring about a figure subject to a resultant practice effect 
equivalent to that operating on the anxious test taken with the SAT. 

The following four values constitute the experimental variables 
that were used in the computation of the effect of anxiety for any 
category of students: 


Ay, anxious test score for students having the relaxed test before 
the SAT 

As, anxious test score for students having the relaxed test after 
the SAT 

R5, relaxed test score for students having the relaxed test before 
the SAT 

Ro, relaxed test score for students having the relaxed test after 
the SAT. 


Each of these figures was obtained, first, by averaging over students 
in the pair of schools taking each test form (Verbal-A or Verbal-B, 
Math.-A or Math.-B) and, then, averaging the Verbal-A average 
with the Verbal-B average and the Math.-A average with the Math.- 
B average. Except for unexpected interactions of practice or anxiety 
with test form, this averaging served to balance out differences in 
difficulties of the test forms. 
The following model was then set up: 


Ai = В, - 14 P +4 
Аа = B+ P+ A 


Р. = В, 
Ra = Bz + 14 P 
where B, denotes the basic ability of the students taking the 
relaxed test before the SAT 
В, denotes the basic ability of the students taking the 
relaxed test after the SAT 
P denotes the effect of practice from 75 minutes of test- 
ing on similar materials (e.g., verbal) 
A denotes the effect of anxiety (and/or fatigue, which 
is confounded with it inextricably because of the posi- 
tion of the experimental section within the SAT). 


560 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


In order to provide solvable equations, this model entails an as- 
sumption with regard to practice. It is assumed that the effect of 
practice increases steadily with amount of testing, over the range of 
testing times represented in this experiment. This particular assump- 
tion was made because Levine and Angoff (1958) reported that, for 
both SAT-V and SAT-M, the practice effect is about 10 points (one- 
tenth of a standard deviation) from one recently taken SAT, about 
10 additional points from a second SAT, and nothing more beyond 
that. As long as the assumption of a steady increase of practice 
effect with testing time was incorporated in the model, it was not 
necessary to assume anything about the absolute amount of the 
practice effect. Amount of practice effect, P, remained, therefore, as 
a dependent variable, making a total of four unknowns in the four 
equations. 

Other possible assumptions with regard to the effect of practice 
were tried out with one set of data. However, no important differ- 
ences in the conclusions were evident, even where these assumptions 
were rather extreme, such as the assumption that all practice effects 
appear after 75 minutes, and no additional effects appear there- 
after. 

Since the model was entered with test means in raw score units, it 
was necessary to multiply the roots of the equations by a constant 
so that they would appear, for the sake of easy interpretation, in 
terms of the College Board standard scale, which has a mean of 
500 and a standard deviation of 100. The constant used was the 
conversion constant for the January 1960 SAT multiplied by the 
ratio of standard deviations for a 75-minute test (with a reliability 
of .91) and a 30-minute test containing parallel material. 

be major findings appear in Table 3. The significance of the 
values in the table was estimated through the consideration that 
they were computed as weighted composites, 

Table 3 shows significant 
pected for the effect of practi 
rather than 10 points, None 


and somewhat larger values than ex- 
ice. The range is from 11 to 17 points, 
of the results for anxiety are significant, 
ence between the effect of anxiety on the 
n the mathematical test is significant at 
5 ce indicates that pressure conditions tend 
© osi mathematical seores for girls relative to their verbal scores 
This is opposite to the result that might have been anticipated. Un- 


JOHN W. FRENCH 561 


TABLE 3 


Anxiety and Practice Effect in Scaled Score Points for All Cases and by Sex 
(Negative values indicate that the anxious test scores were 
lower relative to the relaxed test scores) 


Verbal Mathematical 
N Anxiety Practice N Anxiety ^ Practice 
Boys 661 1 i 575 -2 17 
Girls 427 —11 14* 378 10 11 
Both 1088 —4 12** 948 3 15** 


* Significant at .05 level. 
** Significant at .01 level, 


fortunately, in this area it is probably possible to offer a plausible 
explanation for any finding that may occur. Here is one such expla- 
nation: In situations like those obtaining in this experiment, girls, 
disliking mathematics as many of them do, may simply give up or 
fail to cooperate on a “relaxed” test, while their efforts on an 
“anxious” test bear at least some fruit. With verbal materials, on 
the other hand, relaxed conditions may be most suitable for per- 
mitting full expression of the girls’ more natural interests. 

Table 4 contains the computed values for the effect of anxiety and 


TABLE 4 


Effect of Anxiety and Practice for Groups К 
Making Different Responses to the Questionnaire 


Verbal Mathematical 
eS ee 
Anxiety Practice Anxiety Practice 
ee eee 


M F M F MISMA 


Group means from Table3 + 1 —11 11 14 -—2 10 17 п 


Feelings 1- not anxious —4 —13 17 26 -11 —12 13 15 
on test 2- slightly $ 6 7 91 МЕЗЕТТЕ 

` 3- anxious з —il 11 —10 1 20 21 10 
Christie 1- low -11 -9 24 -1 6 и 10 20 
Anxiety 2- medium 3) 28 4з @ а 
3- high 12 —15 10 20 —10 12 26 15 

SAT-V 1- low B -8 3. 295195 D ore 
2- medium —2 -—18 15 21 1 18. 12 aa 

3- high I -9 1 3 —Do 7 1 

SAT-M 1- low & 19 7. 911/51 019 RS MC 
2- medium Дт. <9 117. 9797 ЗОБУ 

3- high -1 -6 7 9 -15 106 4 

—— a o ME 


502 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
practice for students indicating different intensities of anxious 
ings during the test, having different scores on the Christie An 
Scale, and standing at different levels on the SAT. 
The only significant result for anxiety here is the one sho 
that girls not only do better on an anxious test of mathematies, bi 
the advantage goes to precisely those girls who feel most anxious 
Results for the anxiety scale, which measures general anxiety rath 
than specific test anxiety, do not confirm this pattern. 
Insignificant results appearing in Table 4 for the groups classifie 
into low, medium, and high on SAT-V or SAT-M indicate that t 
anxiety is not a detriment to performance even when the test is 
difficult for the subjects. Except for the results for boys on SAT-M; 
this suggests a contradiction to the findings of several workers, wh 
have shown that the difficulty of a test is particularly detrimental to 
the performance of high-anxious subjects (Sarason & Palola, 19 
Wiener, 1959; McKeachie, 1958). { 
Many other comparisons were made on the basis of questionnaire 
Tesponses and anxiety scales, but there were very few significant 
deviations from the group means, and these were not confirmed by 
similar deviations for the opposite sex or kind of test. This leads 0 
the conclusion that the effect of anxiety as defined in this study is 
small on the average, particularly when compared to the standart 
error of measurement іп the SAT of about 30 points, and is nol 
highly important in any obvious grouping of students. 


Summary 
Are there some 
best levels when 


test to some candidates a few 
after the 


background, academic 


: "i, and reasons for feeling anxious. " 
scales measuring generalized anxiety were included. Teachers’ gradi 


in four different high-school subjects were collected. E 
The results show the effects of anxiety to be small. A compariso 
of scores on the relaxed tests with alternate test forms taken 


JOHN W. FRENCH му 


the college entrance test indicates no effect of anxiety for boys, while 
anxiety seems to be associated with a slight but significant im- 
provement of girls’ mathematical scores relative to their verbal 
scores. The relaxed test, the *anxious" test taken with the SAT, and 
the SAT itself were all found to have substantially the same con- 
current validities, Therefore, it is unlikely that anxiety has had any 
effect on predictive validity. Responses to the questionnaire brought 
out expected statements about anxious feelings. Girls who felt 
anxious showed up well on the mathematics test. However, for the 
other categories of students under observation, differences in the 
effect of anxiety were not consistent and were rarely significant. All 
of them were below the standard error of measurement of the test, 


REFERENCES 


Alpert, R. and Haber, R. N. “Anxiety in Academic Achievement 
Situations" Journal of Abnormal and Social Psychology, LXI 
(1960), 207-215. 

Calvin, A., Koons, P., Jr., Bingham, J., and Fink, Н. “A Further 
Investigation of the Relationship Between Manifest Anxiety 
A: Intelligence.” Journal of Consulting Psychology, XIX (1955), 

282. 

Christie, R. and Budnitzky, S. *A Short Forced-Choice Anxiety 
Scale.” Journal of Consulting Psychology, XXI (1957), 501. 
Dreger, R. M. and Aiken, L. R. "The Identification of Number 
Anxiety in a College Population." Journal of Educational Psy- 
chology, XLVIII (1957) , 344-351. è 

Hastings, J. “Tension and School Examinations.” Journal of Experi- 
mental Education, XII (1944), 143-162. 

Howe, E. S. and Silverstein, A. B. “Comparison of Two Short-Form 
Derivations of the Taylor Manifest Anxiety Scale." Psychologi- 
cal Reports, VI (1960), 9-10. b 

Kerrick, J. “Some Correlates of the Taylor Manifest Anxiety Seale. 
Journal of Abnormal and Social Psychology, L (1955), 75-77. 

Klugh, H. and Bendig, A. “The Manifest Anxiety and ADE Seales 
a Teeny Achievement.” Journal of Consulting Psychology, 

55), 487. 

LaMonaea, Н. L. and Berkun, M. M. “Army Data on Taylor MAS, 
Intelligence, and Ego Strength." EDUCATIONAL AND PsYCHOLOGI- 
CAL MEASUREMENT, XIX. (1959), 577-578. 

Lazarus, R. S. and Eriksen, C. W. “Effects of Failure Stress Upon 
Skilled Performance.” Journal of Experimental Psychology, 
XLIII (1952) , 100-105. 

Levine, R. 8. and Angoft, W. H. “The Effects of Practice and Growth 
on Scores on the Scholastic Aptitude Test.” College Entrance 

L zamination Board Research and Development Report, 8 А 

und, К. W. “Test Performance as Related to the Order of Item 


54 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT - 


Diffieulty, Anxiety, and Intelligence.” Unpublished Ph.D. t 
Northwestern University, 1953. 
Matarazzo, J., Ulett, G., Guze, S., and Saslow, G. "The Relation 
, Between Anxiety Level and Several Measures of Intel igen 
Journal of Consulting Psychology, XVIII (1954), 201-205 
MeKeachie, W. J. "Students, Groups, and Teaching Metho 
American Psyc ist, XIII (1958) , 580-584. 3 
Purcell, C., Drevdahl, J., and Purcell, К. “The Relationship Betwe 
Altitude-IQ Discrepancy and Anxiety." Journal of Clinical P 


chology, XX (1956), 220-222. 
Sarason, I. G. “Test Anxiety, General Anxiety, and Intellec ша 
Performance." Journal of Consulting Psychology, XXI (1957 


Sarason, I. G. and Palola, E. G. “The Relationship of Test and G 
eral heat MY of е end sperimentale 0 
erformance," Journal o erimental Psycholo AL 
(1960), 185-191. lad NE 

‚ S. B. and Mandler, G. “Some Correlates of Test Anxiety, 
тег ој Abnormal and Social Psychology, XLVII (1952) 
Sarnoff, I., Sarason, $., Lighthall; F., and Davidson, К. "Te 
Anxiety and the ‘11 Plus’ Examinations.” British Journal of Edi 

s ional Psychology, ХХІХ (1959), 9-16. | 
unders, D. R. “Some Preliminary Interpretive Materials for tli 
PRI." Princeton, N. J.: Educational Testing Serviee Re 
Memorandum 55-15, 1955. 
uA R. and Calvin, A. “A Failure to Replicate the Finding of 
Pico Correlation Between Manifest Anxiety and AC 
е Journal of Consulting Psychology, XIX. (1955), 22 


p C. D. and Katzenmeyer, W. С. “Manifest Anxiety 


telligence, and College G: » lting Psi 
доо Xx (1950) 278. Journal of Con ng ОЕ 

‚Сб. е interaction Among Anxiety, Stress Instruc 
à (10) SDL Bas, Journal of Consulting Psychology, X 
weibelson, I. "Test Anxiety and Intelligence Test Performance 
Journal of Consulting Psychology, XX (1956), 479—481. 


Enccartoxat awp Percnotoctcat MEASUREMENT 
Vot. XXII, No. 3, 1962 


THE RELATIONSHIP BETWEEN ITEM DIFFICULTY 
AND TEST VALIDITY AND RELIABILITY 


CHARLES T. MYERS 
Educational Testing Service 


Introduction 


THERE appears to be a cultural lag in the practice of test con- 
struction. In spite of the general agreement among test theorists— 
such as Cronbach and Warrington (1952), Gulliksen (1945), Lord 
(1952), Richardson (1936), and Thurstone (1932)—that, for maxi- 
mum reliability, tests should be homogeneous with respect to item 
difficulty, those who produce standardized tests contjnue to follow à 
tradition that the items selected for a test should represent a wide 
ı Tange in difficulty. 

Let us examine this paradox. We may note in passing that the 
theorists are concerned with difficulty only in a statistical sense, that 
„is, with the per cent of the sample who answer an item correctly. The 
test builder, on the other hand, is also concerned with difficulty in а 
psychological sense as it affects the morale or behavior of the test 
iaker. One may reasonably assume that these two concepts of difi- 
eulty are imperfectly correlated. 1 

The test constructor is, in general, not primarily concerned with 
the reliability of his product. Validity is usually considered to be of 
greater importance and reliability is often sought after mainly as а 
means to the end of increased validity. It is generally accepted that, 
other things being equal, an increase in reliability will produce an 
increase in validity although the increase in validity will be smaller 

(Brogden, 1946; Ferguson, 1941; Tucker, 1946). / 

There are some practical reasons for constructing tests according 

to the established pattern. This pattern has become widely accepted 
565 
um 


566 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


and a test publisher must sell his tests to remain in business. Fur- - 
thermore, it may be expensive to collect a sufficient number of items 
within a restricted range of difficulty levels. Also, most standardized 
tests are designed to be used by schools with widely different stand- 
ards. The test constructor may doubt the applicability of the theory 
on at least two grounds. Are the tests he constructs sufficiently 
homogeneous factorially to fit the theory and would restricting the 
difficulty range of the items adversely affect other characteristics of 
the test? 

The difficulty of an item may be affected by many different things 
such as ambiguity or complexity in the phrasing of the question, 
the reasonableness of the wrong alternatives, or the examinees’ 

„ familiarity with the area of knowledge being sampled. We often 
hope that our difficult items are those that measure complex mental 
processes such as induction and deduction, criticism and application, 
rather than so-called simple recall. It is very difficult to determine 
what makes an item difficult. Therefore we hesitate to change a prac- 
tice which has been considered adequate. 

The question to which this paper is addressed is whether or not, 
under certain conditions likely to be found in common practice, à 
selected set of average difficulty items would be more reliable and 
more valid for the purpose for which they were intended than would 
be a selected set of items half of which were considerably more diffi- 
cult and half of which considerably easier than the average. 


The Variables 

The test that was used in this research was one which had been 
designed to predict the scholastic a 
consisted of 150 items, 75 of which were quantitative aptitude items 
and 75 verbal aptitude. The test was relatively unspeeded and con- 
sisted entirely of five-choice items. From this test four sets of items 
were selected to form two pairs of subtests, These items were se 
lected on the basis of their difficulty as indicated by an item analy- 
sis of the responses of 500 freshmen randomly selected from the 
40,000 freshmen who took the test, (It would have been impractical 
to have made the selection of items on the basis of the responses of 
the experimental groups.) Two similar subtests were selected from 
the items whose difficulty fell within the limits of 40 per cent passing 
and 74 per cent passing. These subtests will be called the “peaked” 


ptitude of college students. It 


CHARLES T. MYERS 567 


tests from the shape of their distribution of item difficulty. The 
other two tests were selected from the items outside this range, with 
one-half the items in each subtest easy and the other half hard. 
These subtests will be called the “U-shaped” tests to describe their 
distribution of item diffieulty. Each of the four subtests were half 
quantitative and half verbal. 

The selection of the items that were to make up the four subtests 
was solely on the basis of their difficulty for the item analysis sam- 
ple. An additional basis was considered in assigning the items to 
their respective subtests. This basis was the biserial correlation of 
the item with appropriate criterion scores. Four subtests, two peaked 
and two U-shaped, with 24 items in each, were selected so that the 
average per cent passed in each subtest ranged from 58 to 59 and 
the average biserial coefficient from .50 to .51. Such factors as item 
type, within the broad categories of verbal and quantitative, and 
position in the test were left to chance. It was considered impracti- 
cal to try to select adequate matched samples on more than two 
bases when selecting from a pool of 150 items. 

The criterion score used for validating the four subtests was 
average grades for the freshman year, normalized for each college 
on a twelve-point scale. All of the colleges used in the study were 
either liberal arts colleges or liberal arts divisions of universities. 
They represented a wide variety both academically and geographi- 
cally. Twelve colleges were included. 


The Population 


The persons used in this study were selected from the freshmen 
at these twelve colleges on one criterion alone, that they had at- 
tempted all the items in the test. There were 1644 freshmen thus 
selected, or approximately 80 per cent of the total of 2015 who had 
taken the test at those colleges. The separate groups varied in size 
from 37 to 392. 


The Analysis 


Four product-moment correlations were computed for the sample 
at each college: the correlation between the two peaked tests and 
the correlation between the two U-shaped tests to obtain two re- 
liability coefficients, and the correlation of grade average with the 
Sum of scores on the two peaked tests and the correlation of grade 


568 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


average with the sum of scores on the two U-shaped tests to obtain 
two validity coefficients. (Note that thus reliabilities are for 24-item 
subtests while validities are for 48-item subtests.) 

At the college with the largest group, two tests were made of the 
data. The regression of grades on the subtest scores was found not 
to be significantly curvilinear, and Votaw's test of compound sym- 
metry was applied to determine whether the two tests in each pair 
should be considered strictly parallel. Although the Votaw test in- 
dicated the tests in each pair were not strictly parallel, the differ- 
ences were in the means only. Since the variances and covariances 
were equal, the correlation coefficients may be accepted as accurate 
alternate-form reliability coefficients. 

Wilcoxon’s matched pairs signed ranks test was then applied to 
compare the reliabilities and the validities of the peaked and 
U-shaped tests. Since the samples at the different colleges were 
selected in somewhat different ways and since the criterion scores 
also differed, it seemed appropriate to use a nonparametric test for 
these data. Furthermore, the sampling here is primarily sampling of 
items rather than persons and the appropriate standard errors are 
not known to the author. The Wilcoxon test is a refinement of the 
sign test, taking into account not only the direction of differences 
but also their absolute size, smaller differences being treated as less 
significant than larger differences. This test was described by Lin- 
coln E. Moses in an article in the Psychological Bulletin in 1982. 
The sign test itself did not indicate any significant differences for 
either the reliabilities or the validities, 

The comparison of the reliabilities of the two kinds of tests was 
made on the basis of 24 correlation coefficients. For each of the 12 
colleges the correlation between peaked tests was subtracted from 
the correlation between U-shaped tests. The 12 differences we? 
then arranged in rank order according to absolute size, from smallest 
b un gest, beginning with one. Of these 12 differences, three wer? 
positive values: the first, the third, and the sixth, The sum of ranks 
for the positive differences was therefore 10. Now there are two 0 
the twelfth power, or 4096, possible ways of assigning positive and 
negative signs to 12 ranks, Our null hypothesis assumes that all 
these ways are equally likely. For these 4096 combinations the 
sums of the positive ranks from a bell-shaped symmetrical distribu- 
tion from zero to 78 with a mean of 39, a variance of 162.5, and 4% 


"(sopeis 2381948) поәушо UM (surojt-8p) sjso3qnhs jo леб Youd Jo ums jo uorepartoo sqenbo Аун A. 
"qj9ue[ 10} po39o1109 JOU *&jsejqns urejr-pz 1e[runs uoeAjeq uonv[eri09 spenbo Are 
7489) eq pojoduroo oqa әвочу [e—uorjoejes drio13 pagg :9930N 


8+ [2208 S8987 9+ 0g29° 2689" 1°29 18 9e' ZL 18 

[jud 1849 897g" EE esos £989” 97€ 152 ee 69 6g 

e+ Eger” LOT: = 1996" 0192, ¢9 18 68: £L 001 

es S007 [s iem E S908 a 29 #8 ep £L €6 

а + 0229 OSPF’ s— сово 1919 F19 001 FG eL оёт 

от— 186} 9899" 6— ZELO +092 Sc 901 29 69 E 

E 6— 8726 TELS: 8 1889 6702 9°79 SIT ye [22 [m 

s+ ©©©} 608^ Га. 1699 ©6219 91Z 6FI ep LL Imi 

pane 8847 0965" Lr FOOL T£69 6'89 191 LN GL 061 

See 2569 189" 0r— S0cG 1089 v £L LLI F97 LL 981 

BH tt OISF 16ёў` £t £929 1909 GIL Сбт Sp LL 8б 

: + 016? 0899" P— 7419 069 8$. 66 or 8L n 

О ue pedeqg-) Рәвәд quud pedeug-n рәҳвәд sum) N ANPIEA — 9008 N 
pousig pousig 96 103 Po[eog 
овәү{ү usop 

senipie A SOMEHOW (sdnoı3 рәўәәәв) (syuvorddv uvurgsar qv) 

81523qng 104 vq 33L [BIOL 10] BC 


sadung ININ, 4о{ sap A PUD вәрәм 
I WISVL 


570 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of the 4096 combinations have sums of 10 or less. Thus approxi- 
mately one per cent of the possible combinations are as small as 
the sum actually found in this experiment, but since we arbitrarily 
assigned the positive and negative directions our results approach 
closely the two per cent level of confidence. We may assume the 
peaked testa to be more reliable. 

The method just described is precise but, eumbersome. A simpler 
method depends on the observation that the distribution of sums of 
ranks is approximately normal. With larger numbers of ranks the 
work of computing the exact values is tedious and the error is less, 
во that the assumption of normality is convenient. Using n for the 
number of ranks and T for the sum of ranks, Moses gives the 
formula for the mean, P = n(n + 1) /4, and for the variance, S? = 
(2n + 1)7/6. Using these formulas would have given us results at 
approximately the three per cent level of confidence. 

The same technique was applied to the 12 pairs of validity co- 
efficients with the result that there were 8 positive ranks with a sum 
of 56. Since there is approximately one chance in ten of obtaining 8 
sum as large as this, we have no grounds for rejecting the null hy- 
pothesis with respeet to validity. It should be pointed out, however, 
that these validity coefficients were computed for a wide variety of 
groups. In some cases the mean scores of these groups were far 
above the score at which the peaked test would be most efficient, in 
other cases somewhat below that point. We had no way of avoiding 
this difficulty in this experiment. Our conclusion regarding validity 
should be tempered by this consideration. 

The mean reliability coefficient for the peaked tests was .69, 
whereas the mean reliability coefficient for the U-shaped tests was 
63. The size of the difference between these coefficients is of practi- 
cal significance. However, this difference would not in itself be ex- 
pected to produce much difference in validity. Actually the meal 
validity coefficient, for the pair of peaked tests was .49 and the 
mean validity coefficient for the pair of U-shaped tests was .51. This 
difference is small, and presumably not significant, 

The comparison of reliability coefficients reported in this pape 
Was made with coeflicients for the reliability of 24-item sets so as t0 
avoid the extra computation involved in correcting for double length. 
However, after the report was finished, curiosity prompted us ® 
make a comparison of corrected reliabilities. To our chagrin the 
result of this comparison resulted in a reduction of the significante 


heads than ours. 


Conclusion 


This experiment did not show any difference in validity between 
the peaked and U-shaped tests. It does, however, support the theory 
that peaked tests tend to be more reliable than tests with other dis- 
tributions of item difficulty when both tests have average item diffi- 
eulty at or near the appropriate level. This difference in reliability 
was found to occur even with the condition that items were not 
homogeneous factorially. Finally, it may be noted that the rela- 
tionship between differences in reliability and differences in validity 
for the 12 groups appeared to be random, which suggests that more 
needs to be known about what it is that makes an item difficult. 


REFERENCES 


CHARLES T. MYERS m 
_ level so that the ranked pairs signed ranks test no longer reached 
the five per cent level. We leave the interpretation of this to wiser 
Brogden, H. E. “Variations in Test Validity with Variation in the 
Distribution of Item Difficulties, Number of Items, and 
of their Intercorrelation." Psychometrika, XI (1946), 197-214. 
Cronbach, L. J. and Warrington, W. G. “Efficiency of Multiple 
ive Tests as a Function of Spread of Item Difficulties.” Psy- 
Ы chometrika, XVII (1952), 127-147. ч x = 
Еро, С. A. "The Factorial Interpretation of Test Difficulty. 
sychometrika, VI (1941), 323-329. з 
Gulliksen, Harold. "The Relation of Item D and loe 
Correlation to Test Variance and Reliability." Psychometrika, 
(1945), 79-91. rk: 
йел, Harold. Theory of Mental Tests, Chapter 14. New York: 
ohn Wiley & Sons, 1950. Е : 
Lord, F. M. The Relation of the Reliability of Multi le-Choice 
Tei to the Distribution of Item Difficulties.” Psychometrika, 
I (1952), 181-194. х 
Moses, L. E. “Non-Parametric Statistics for Psychological Re- 
_ Search.” Psychological Bulletin, XLIX (1952), 122-143. —— f 
Richardson, M. W. “Relation Between the Difficulty and the ye 
ferential Validity of a Test.” Psychometrika, I (1936), 33-49. 
Thurstone, Thelma G. “The Difficulty of a Test and its Diagnostic 


| poui Journal of Educational Psychology, XXIII (1932), 335- 
3 


Tucker, L. R “Maximum Validity of a Test with Equivalent Items.” 

„Psychometrika, XI (1946), 1-5. 4 "D. 
ileoxon, F. “Individual Comparisons by Ranking Methods.’ Bio- 
„Metrics, I (1945), 80-83. a 3 

Wilcoxon, F. sens Tables for Individual Comparisons by 
Ranking Methods.” Biometrics, III (1947), 119-122. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


THEORETICALLY DERIVED CHANCE SCORES AND 
THEIR NORMATIVE EQUIVALENTS ON A SELECTED 
NUMBER OF STANDARDIZED TESTS 


GILBERT SAX 
University of Hawaii 


Tzsr utilizers generally expect raw scores falling within the realm 
of chance to yield significantly low centiles, 1.Q.’s, or grade place- 
ment norms, However, some standardized tests yield chance or 
guessing scores, which, when converted to their normative equiva- 
lents, yield far higher norms than one would expect. 

Some rather serious errors of judgment can occur if educators and 
psychologists are not aware of spuriously high normative equiva- 
lents of chance scores. For example, assume that а school district 
employs the California Capacity Questionnaire (Sullivan, Clark & 
Tiegs, 1945). The manual states that this test is designed for indi- 
viduals eleven years of age and older, However, if the test were 
actually given to an eleven-year-old the I.Q. corresponding to а 
chance raw score would be 116 and would place the pupil at the 80th 
percentile for his age group. Obviously, the test is inadequate for 
eleven-year-olds because it is much too difficult for that age group. 

The test manual should report the proportion of the standardiza- 
tion group which does no better than chance and the standard devia- 
tion of chance scores. 

These, respectively, may be defined as: 

M, = п/с and 8= voted 
Where n is the number of items on the examination and c is the 
number of choices or alternatives on each item. The chance score 
represents the most likely score an individual would obtain if he 


573 


574 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


responded at random to each item on an examination without b 
penalized for responding incorrectly. Such random responding cow 
occur if the examinee is unmotivated to take the examination, if t 
test is poorly designed for his age level because it is too difficult, o 
if he does not have sufficient time to complete all items. Gulliksen 
(1950, p. 263) has suggested that the lowest interpretable score 
an examination should be at least 2s, above n/c and that score 
falling between n/c and 2s, should not be construed as sign 
the attainment of subject matter on that examination. 

Some evidence (Cliff, 1958) indicates that chance scores base 


examiner is aware that a given norm is at the chance level, seriou 
errors of interpretation are likely to be made. ] 

Table 1 is designed to indicate the prevalence of spuriously higl 
normative equivalents of chance scores on a selected number 0 
standardized tests which do not employ correction for guessin 
formulas, The assumption is made that the examinee responds in 8 
random manner to each item and that all items are attempted. Tht 
norms were derived for the lowest appropriate age or grade 8 
Stated in the test manual. 

The fact that several test publishers have devised methods whic 
help test consumers recognize scores at the M, level is some evidi 
that progress is being made in reducing confusion as to the interpre- 
tation of chance score equivalents, The Chicago Reading Test Di 
(Engelhart & Thurstone, 1939), for example, was prepared for U 
in grades 6, 7, and 8. The raw score equivalent to a grade score 0 
6.0 is far above +-2s, as recommended by Gulliksen. 


GILBERT SAX 575 
TABLE 1 
Chance Scores and Their Normative Equivalents 
on a Selected Number of Standardized Tests 
Chance SD Chance Percen- — Grade 
Test Score Score tile Equivalent LQ. 
American School Achievement 
Tests, Primary, Grade 1 
Word Recognition 6 4.50 — 1.2 = 
Word Meaning 6 4.50 = 1.4 
Numbers 11 2.74 — 1.8 = 
California Short-Form Test 
of Mental Maturity, 
Grades 10 to Adult 
Language Section 21 3.93 20 7.9 88 
Non-Language Section 20 3.60 2 5.6 72 
Cooperative School and 
College Ability Tests, 
Form 1c College Freshmen 
Verbal 12 3.08 18* Е Еу; 
Iowa Every-Pupil Test of 
Basic Skills, Grades 3-5 
Map Reading 4.50 1.83 — E ds 
Use of Dictionary 4.80 1.88 = 3.7 яу 


Lorge-Thorndike Intelligence 
Test, Verbal Battery, 79 
Grade 10-12 19 3.80 3 as 

Ohio State University 
Psychological Test, Grades 


9 to College 30 4.90 35 = zr 
Otis Quick-Scoring Mental 

Ability Tests, Gamma Test, 80 

Senior High and Colleges 17 3.51 m E 


SRA Achievement Series, 
Form A, Grades 2-4 
Auditory Discrimination 25 3.53 = a 
Reading Vocabulary 10 2.74 КЕ ` 


fal 


0 
1 


* The SCAT manual reports only centile ranges equivalent to 1 SEmeas from a score equi- 
valent. The reported centile is the median of the reported range. 
the role that chance scores play in test interpretation, and are more 
willing to restrict age and grade ranges to more appropriate levels, 
Measures of predictive and concurrent validity are likely to be in 
error. 


REFERENCES 


Clark, W. and Tiegs, E. California Short-Form Test of Mental 
_ Maturity, Рани. Los Angeles: California Test Bureau, 1957. 
Cliff, R. “The Predictive Value of Chance-Level Scores.” EDUCA 


576 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TIONAL AND PSYCHOLOGICAL MEASUREMENT, XVIII (1958), 
607-616. 

Edueational Testing Service, Cooperative Test Division. Coopera- 
tive School and College Ability Tests, Form 1C, 1955. 

Engelhart, M. and Thurstone, T. G. Chicago Reading Test D1. 
Milwaukee: E. M. Hale and Co., 1939. { 

Gulliksen, Н. Theory of Mental Tests. New York: John Wiley & 
Sons, 1950. 

Henmon, V. A. C. and Nelson, M. U. The Henmon-Nelson Tests of 
Mental Ability. Boston: Houghton Mifflin, 1931. 

Lorge, I. and Thorndike, В. T. Lorge-Thorndike Intelligence Tests, 
Level 5. Boston: Houghton Mifflin, 1954. 

Otis, A. S. Otis Quick-Scoring Mental Ability Tests, Gamma Test. 
New York: World Book Company, 1937. 

Spitzer, H. F. Iowa Eve y-Pupil Test of Basie Skills. Boston: 
Houghton Mifflin 1947. 

Sullivan, E. T., Clark, W. W., and Tiegs, E. W. California Capacity 
Questionnaire. Los Angeles: California Test Bureau, 1945, 
Thorpe, L, P., Lefever, D. W., and Naslund, R. A. SRA Achieve- 

н Series, Form A. Chicago: Science Research Associates, 


Tiegs, E. and Clark, W. California Achie 
alifornia Test Bureau, 1957, 
Toops, H. A. Ohio State University Psychological Test. Chicago: 
Science Research Associates, 1941 
Young, R. V., Pratt, W. E., and Gatto, F. American School Achieve- 
ment Battery, Primary Battery I. Publie School Publishing 
Company, 1941, 


vement Tests. Los Angeles: 


4 
| 


| 
, 


eee 


EL 


EDUCATIONAL AND Psycrotogr 
Vor. XXII, No. 3, 1962 any 


THE “DIFFICULTY” OF A PERSONALITY 
INVENTORY ITEM 


CHARLES HANLEY: 
Michigan State University 


THE concept of “item difficulty," so important in the theory and 
construction of tests, seems at first sight to have little relevance for 
personality inventories, Yet anyone who has responded to a ques- 
tionnaire knows how much harder some items are to answer yn 
others, The difficult question in the inventory may stimulate 1087 
cence, seem ambiguous, or ask for something hard to estimate in 
oneself. Perhaps one wayers between defensiveness and candor, or 
finds the option of “trye” or “false” inappropriate for categorizing 
complex personal attitudes, 

Because inventory items have no universally correct answers, the 
assessment of difficulty requires a method different from that ur 
ployed with tests, Fortunately, it is well-known in psychophysics 
(Woodworth, 1938) that the more difficult a judgment is to make, 
the longer it takes. (When two weights are nearly equal, for yen” 
ple, deciding which is heavier is slower work than when the weights 
differ markedly.) If judging the personal applicability of EU 
statements is similar to judging physical dimensions s ed 
difficult: personality items will have longer response latencies. Simi- 
larly, just as the more difficult psychophysical judgments Кё! 
accompanied by lowered confidence by the judge in his decision, so 
difficult personality items should be associated with lowered confi- 
dence by the individual in his answers. 

The present study involves a first-approximation to the measure- 


Se За: 
„ ће data were collected and analyzed while the author was a member of 
the visiting faculty of Hollins College. 


577 


578 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ment of response latencies for typical inventory items. A demon- 
stration of differing latencies for items, however, would not complete 
the analogy with psychophysics. The experimental psychologist has, 
besides response latency, another measure of difficulty: the physical 
similarity between the stimuli to be judged. What possible second 
measure of difficulty can be studied with personality items? 

Most inventory items of the type used in the Minnesota Multi- 
phasic Personality Inventory (MMPI) are answered in a particular 
way by the large majority of normal subjects. Call these “standard” 
items. On other items, however, normal subjects disagree in their 
responses. Fricke (1957) calls such items “controversial,” a term 
he applies to items endorsed by 40-60 per cent of subjects. Contro- 
versial and standard items, on the average, should differ in diffi 
culty. A number of reasons suggest that the controversial type will 
be the more difficult. 

When subjects guess at the answer to a difficult item, it is reason- 
able to think their choices will approximate a 50/50 split between 
true and false, Moreover, it is widely assumed that controversial 
items are more influenced by the response set, acquiescence (Wig- 


gins, 1962). This response set, according to Cronbach (1946), ap- 
pears when test items are difficult: 


“Response sets have the greatest influence on score in am- 
biguous or unstructured situations, If a situation is structured 
for the student, so that he knows the answer required, he re- 
sponds directly to the content of the item, and response sets 


probably are unimportant, Acquiescence appears on difficult 
true-false items , , .” (p. 483) 


1, as Wiggins (1962) holds, acquiescence is more likely with 
controversial than with standard items; and if, as Cronbach (1946) 
states, acquiescence is elicited by difficult items; then controversial 
items will typically be more difficult than standard items. The 
present study, therefore, compares response latencies for standard 
and controversial items to determine the usefulness of the concept 
of item difficulty in connection with personality inventories. 


Procedure 


Personality Items. The items, taken from the MMPI, were repro- 
duced by the Ditto process in a five-page booklet. The first page 


CHARLES HANLEY 579 


contained instructions, modified from the MMPI booklet so as to 
allow the use of a special answer sheet on which latencies could be 
recorded, followed by 11 items from the first page of the MMPI. 
Each of the remaining four pages contained 16 items ; these were 
critical for the study. On one page items all were standard and short 
in length, none containing more than 11 words, There was one page 
of standard-long items; these ranged from 12 to 25 words in length. 
There was one page of short-controversial items, and a page of 
long-controversial items. The first item on the short-controversial 
page had exactly the same number of words as the first of the short- 
standard items, and this matching of items for length and position 
on the page was carried out for all standard and controversial 
entries, 

Controversial items were selected from 36 endorsed by 40-60 
per cent of Hathaway and McKinley’s MMPI normative group of 
113 normal college females. Standard items then were chosen to 
match for length; half were endorsed by 20-or-less and half by 
80-or-more per cent of the normative group. 

The four pages of items critical in the study could be arranged in 
24 different orders. The booklets were assembled so that each of the 
orders was used an equal number of times. Page numbers then were 
added by hand. 

The answer sheets differ from ordinary true-false blanks in two 
ways: (1) items were numbered by pages rather than continuously 
through the inventory; (2) at the end of each page’s section of the 
answer sheet was a blank labeled “Time Number.” The subject 
answering a page of the inventory filled in this number when fin- 
ished with that page’s questions. The difference between the time 
number at the end of a page and the time number just preceding 
measures the time required to finish the page. 

Timing. The “clock” was a deck of 5 X 8 inch file cards, num- 
bered from 1 through 200, held by the investigator, who was in 
front of the subjects, so that one number appeared at a time. Shortly 
after the students began answering questions on the first page of the 
inventory, the experimenter, with the help of a stop watch, began 
turning the cards at the rate of one every five seconds. The time 
number for the end of a given page may be multiplied by five to 
give the number of seconds a subject required to get that far in the 
inventory, 


1 
580 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT . 


Confidence Judgments. After all subjects had finished answerii 
the inventory, a matter of approximately 20 minutes, they wel 
instructed to go over the questions again and circle on the ans 
sheet the numbers of the five questions about which they were the 
most doubtful of their answers. | 

Subjects. 'Тһезе undergraduate women were students in an intro- - 
duetory psychology class at Hollins College; they are above-average _ 
in socio-economic status. While 74 participated in the study, the 
protocols of 72 were used, because this number provides equal _ 
representation of each of the different orders in which the four | 
eritical pages were arranged. 

The subjects were told that the inventory was provisional, hat 
information was wanted as to its operating characteristics in 
normal group, and that the timing information was needed in ca 
the inventory had to be given in a situation where time was shol 


The results of the study were communicated to the subjects later in 
the semester. 


| Results 


Response Latency. The analyses employed the five-sccond time 


units, but the means for the four different lists, shown in Table 1, 
are given in seconds, 


short categories included to check the sensitivity of the timing, Al 


. Standard items, 
For all controve 


TABLE 1 
Mean Seconds Required to Finish Critical Pages of Items 
ag aS — See 
Type of Item 
crm === 23 
Iter Length Standard Controversial таси 
ва —_ 
Ee "n 107.6 
Total Mà 0 E E 


س 


CHARLES HANLEY 581 


The sign test is relatively crude, but the factorial design of the 
study makes the use of analysis of variance possible. The balancing 
of orders tends to increase the size of the error term (Lindquist, 
1953, p. 163), assuming order effects occur, and increases the chance 
of a Type II error. Thus the analysis is conservative. The results, 
using scores transformed logarithmically® to reduce the skewness 
characteristic of latency measures, appear in Table 2. They confirm 
the earlier analysis employing the sign test. There appears, in addi- 
tion, an interaction between item length and item controversiality, 
but not at a level of significance as great as found for the main 
effects. Table 1 indicates that the differences between the contro- 


TABLE 2 
Analysis of Variance of Time Scores for Critical Pages of Items 
Source Sum of Squares df Mean Square F 
nn rt 7 99 Ур NAT I 
Length of Item 19,718 1 19,718 478.6** 
Type of Item 1,062 1 1,062 25.78** 
Subjects 24,408 71 345 8.37" 
Length X Type 344 1 344 8.35 
Length X Subjects 2,195 71 30.9 0.75 
Type X Subjects 3,444 71 48.5 1.18 
Triple Interaction 2,025 7 41.2 
Total 54,156 287 
aye! 
** p <.001 
*p <.01 


Versial and standard items is greater when they are short. One 
explanation of the interaction is that long items, apart from requir- 
ing more reading time, are harder to comprehend than short ones, 
whether controversial or not. 

Confidence Judgments. The association between judgments of 
“doubtful” and the main variables can be tested by determining 
whether a subject gave more doubtful judgments to short or to long 
items, and to controversial or to standard items. The sign test ap- 
Plied to these data indicates the controversial items drew more 
judgments of doubtful than did standard items (p < .05), while 
long items more often were judged doubtful than were the short 
(p < .01). The first result agrees with the original hypothesis, while 


* Time scores (x) were transformed into a form convenient for computation 
by the formula: 100 log (2 + 1) — 100. 


582 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the second harmonizes with the rationalization given the interaction 
between item length and controversiality. 


Discussion 


The results show that the concept of item difficulty is useful with 
personality inventory items. Difficulty can be measured by response 
latency and confidence ratings, and is associated with item contro- 
versiality, held by Wiggins (1962) and others to indicate item 
ambiguity. Sheer item length also appears to contribute to item 
difficulty, although this is an idea derived from the present study 
rather than tested by it. 

Difficulty, of course, is not the prerogative of controversial and 
long items. Several Short-standard items were often judged as 
doubtful. A most difficult item for some subjects, as revealed by 
their spontaneous comments after testing, was the short-standard 
"I loved my mother.” This item, probably an easy one for adults to | 
answer in the 1930s, poses difficulties nowadays for college students 
whose mothers rarely are dead. The implications in the question, for 
those students who had trouble with it, were either: “I used to love 
my mother, but I don’t any more,” or “I loved my mother and still 


do.” A student who conjures up these two interpretations finds the 
question hard to answer, 


The ambiguity 
into the study of j 
clarity or ambi 
of wording, 


controversial or standard items for measuring response latency, 8$ 
in the present case, has the advantage of keeping the experimental & 
“imation close to the real-life testing procedure. Still, it is probable 


basis if the pencil-and-paper format is 
the memory drum, 

a study it might be expected that three influences will 
affect answers to difficult items, influences that should be absent 4 
with easy items, First, the difficult items (those with long latencies) 
may indicate “complexes” of the kind Jung searched for with the 
association test. Second, the long latencies may be diagnostic of 
lying, as Wertheimer long ago suggested with free association. The 


CHARLES HANLEY 583 


one study of such “lies” on the MMPI (Calvin & Hanley, 1957) is 
concerned with their detection rather than with the kind of items 
on which they occur. Both of these influences, of course, suggest the 
use of the inventory in a way Cattell (1957) has proposed for per- 
sonality tests, that is, changing the inventory into an “objective 
test.” 

The third possible influence on answers to difficult items is 
acquiescence, again suggested by earlier work. With the present 
Short lists of items, it is hard to determine if the different types of 
items vary in susceptibility to the set. The data in Table 3, there- 
fore, are hardly more than suggestive. They consist of Kuder- 
Richardson Formula 20 reliabilities for the lists in the study, scored 
on the special answer sheets for the Hollins sample, and on standard 
answer sheets for the university men and women (from Michigan 
State University classes in introductory psychology in 1954) and 
male prisoners (tested routinely upon entering the Southern Michi- 
gan Prison at Jackson). 

The results with the long-controversial list consistently agree with 
the notion that difficult items elicit acquiescence, but the data for 
the short-controversial items are less consistent. To settle the 
question, much longer lists of items are needed, if the present-timing 
procedure is used. Unfortunately, the MMPI pool of controversial 
items is pretty well used up by even this small sampling. A better 
approach would be to employ the individual determination of re- 
Sponse latencies for individual subjects as proposed earlier, Suitable 
analyses can be devised to decide whether or not the individual 
Subject employs a different response "style" (Jackson & Messick, 
1958) with difficult and with easy items. 


TABLE 3 
Internal Consistency (K-R 20) of Acquiescence on Critical MMPI Items 


Type of Item 
Short Long Short Long . 
Sample Standard Standard Controversial Controversial 
72 Hollins women .03 ‚18 .06 -30 
68 MSU women —.24 .05 —.12 .45 


SU men .00 .22 .39 .57 
106 male prisoners =0 . .01 25 E 


584 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Summary 


It has been shown that MMPI items differ in difficulty, as meas- 
ured by response latency and by post-test confidence judgments. 
Longer items are more difficult than short ones, and this difference 
may be due to factors beyond reading time. Controversial items, 
that is, items answered true or false in nearly the same proportions 
by normal subjects, are more difficult than standard items. Item 
difficulty may he related to the response set, acquiescence, although 
the evidence in the present study is sketchy on this point. 


REFERENCES 


Calvin, A. D. and Hanley, C. “An Investigation of Dissimulation 
on the MMPI by Means of the ‘Lie Detector." Journal of 

rl fu Psychology, XLI (1957), 312-316. 

Cattell, R. B. Personality and Motivation Structure and Measure- 
ment. Yonkers-on-Hudson, New York: World Book Company, 
1957. 

Cronbach, L. J. "Response Sets and Test Validity." EDUCATIONAL 
AND PSYOHOLOGICAL MEASUREMENT, VI (1946), 475-494. | 

Fricke, В. С. “A Response Bias (B) Scale for the MMPI." Journa 
of Counseling Psychology, IV (1957), 149-153. А 

Jackson, D. N. and Messick, S. *Content and Style in Personality 
Assessment," Psychological Bulletin, LV (1958), 243-252. 

Lindquist, E. F. Design and Analysis of Experiments in Psychology 
and Education. Boston: Houghton Mifflin, 1953. 1 

Wiggins, J. S. "Strategie, Method and Stylistic Variance in the 
MMPI.” Psychological Bulletin, LVIX. (1962) , 224-242. 


Woodworth, R. 8. Experimental Psychology. New York: Henry 
Holt, 1938, 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


ELECTRONIC COMPUTER PROGRAMS AND 
ACCOUNTING MACHINE PROCEDURES 


Edited by 


WILLIAM B. MICHAEL 
University of California, Santa Barbara 


Methods of Computing Correlation Matrices on the IBM 
1620. NATHAN JASPEN seoaren И 


A Fortran Program for Multiple Regression Scores. EDWARD 
LEVONIAN AND RAYMOND GREGORY ...................... 


AKER ооо RENE: TP m 


587 
595 
599 


613 


586 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


In view of the tremendous advances that have been made in the 
adaptation of electronic computers and accounting machines to the 
processing of statistical data, sections of the Spring and Autumn is- 
sues of EDUCATIONAL AND PSYCHOLOGICAL MEASURE- 
MENT are devoted to the publication of such programs as are 
appropriate to psychometric procedures. Programs relevant to such 
problem areas as factor analysis, item analysis, multiple regression 
procedures, the estimation of the reliability and validity of tests, 
pattern and profile analysis, the analysis of variance and co- 
variance, discriminant analysis, and test scoring will be consid- 
ered. Customarily a program should be expected not to exceed 
six or eight printed pages. Manuscripts of four or fewer printed 
pages are preferred. Each manuscript will be carefully reviewed as 
to its suitability and accuracy of content. In some instances ana 
cepted paper may be returned to the author for possible revisions 
or shortening. The cost to the author will be fifteen dollars per page 
for regular running text. The extra cost of the composition of tables 
and formulas will be added to the basic rate. Manuscripts received 
up to November first will be considered for the Spring issue; manu- 


scripts received between then and May first will be considered for 
the Autumn issue. 


All correspondence should be directed to 
William B. Michael 
Professor of Education and Psychology 
University of California, Santa Barbara 
University, California 


^ 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Vor. XXII, No. 3, 1962 


METHODS OF COMPUTING CORRELATION MATRICES 
ON THE IBM 1620 


NATHAN JASPEN 
New York University 


Tum recently developed IBM 1620 is a variable word-length 
computer, available in several sizes ranging from 20,000 digits to 
100,000 digits. The entire memory can function as one gigantic 
accumulator, or as thousands of small individual accumulators, as 
the programmer wishes. A datum can consist of any number of 
digits from one up to the entire memory capacity of the computer; 
an instruction, on the other hand, is uniformly 12 digits. 

The programmer specifies word length by a combination of two 
techniques: he sets a flag at the left end of the word, and he 
addresses the word by the address of its right-most digit. If a flag is 
Set over position 10001, and if no other flags have been set between 
10001 and 20000, then the instruction “Multiply 20000 by 20000” 
Places the square of the 10000 digit number between 10001 and 
20000 in the product area. 

The purpose of this paper is to present several methods of com- 
puting cross-products that take advantage of the peculiar capabili- 
ties of the 1620. These techniques are applicable also to the larger 
variable-length computers. Comparisons are made with the IBM 
650, a fixed word-length computer. These comparisons are relevant 
because there are more ІВМ 650's extant than all other computers 
combined, 

The computation of a correlation matrix may be divided into two 
parts. The first part consists of accumulating all the cross-products 
as each individual’s scores pass through the computer. This is the 
slow and tedious part of the job. In any given computer, the speed 
of this process depends on the size of the matrix and the number 


587 


588 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of cases. The second part consists of calculating the correlations 

from the cross-products. This begins after the last individual’s set 

of scores have been read into the computer. The length of this 
second step is independent of the number of cases, and is a function | 
only of the size of the matrix. | 

Since the process of accumulating cross-products is a function of 
the sample size, it is a much lengthier procedure than the calculation ' 
of the correlations, We shall, therefore, concentrate on this aspect | 
of the job. | 
| 


Method 1. Address Modification 


In order to multiply A by B, several instructions are required, 


depending on the size of the computer. The IBM 650 requires 4 in- 
structions, as follows: 


1. Place A in the accumulator, 

2. Multiply by B. 

3. Add to this product the previously accumulated product 34B. 
4. Store the updated product ХАВ in its allocated cell. 


Since the 1620 has no accumulators, or, more exactly, since it is all 
accumulators, the above process requires only two instructions: 


1. Multiply A by B. 
2. Add this product to the previously accumulated product ХАВ. 
s 


Accumulative multiplication is ordinarily a two-step process on | 


the 1620 because the multiply instruction clears the product area 
before placing the product there, The product area is a specific 20 
digit area (storage positions 80-99 inclusive). If the product to be 
formed is larger than 20 digits, the left portion of the product will 
cumulate to the Storage positions to the left of the product area. 
Therefore these positions are cleared by the programmer before the — 
multiply instruction. It is Possible to capitalize on this idiosyncrasy 
to accomplish accumulative multiplication in one instruction. Sup- 
pose that the factors to be multiplied are two or three digit numbers. 
If each factor is imbedded in a 15 digit number, 10 zeros to the right 
of the factor and 3 or 2 to the left, the accumulative product will 
form in the 10 positions immediately to the left of the usual product 
area, These 10 positions should, of course, be initially cleared before 


NATHAN JASPEN 589 


the cumulation begins. However, the multiplication of 15 digit 
numbers is considerably more time-consuming than the multiplica- 
tion of three-digit numbers. Furthermore, the problem may require 
that the successive products be cumulated to different amounts. 

Suppose that we wish to accumulate a 40 X 40 matrix. This 
requires 1600 cells for the 1600 accumulative products, which we 
will limit, for the purpose of discussion, to 10 digits each. It also 
requires 1600 sets of instructions, the number of instructions in each 
set depending on the type of computer. The 650 would require four 
instructions per set, while the 1620 would require two per set. Since 
the standard 650 has room for only 2000 instructions and accumula- 
tive products combined, this method of writing straight line instruc- 
tions for the 1620 far exceeds the capacity of the 650. It would be 
Possible to have a program of 3200 instructions for a large 1620, 
except that no one would bother to write such a large program, and 
its usefulness would be limited exclusively to 40 X 40 matrices. 
Therefore, for the 650 as a necessity, and for the 1620 as a conven- 
ience, we resort to address modification. 

An outline of the instructions for the 1620 might be somewhat as 
follows: 


- In instructions 9 and 12 below, enter 40, the size of the matrix. 
. In instruction 6 below, let the accumulative cell be AA. 
- In instruction 5 below, let the first variable be A. 
- In instruction 5 below, let the second variable be A. 
- Multiply A by A. 
- Accumulate the product to cell АА. 
- Update the accumulative cell in 6 by one (to AB). 
- Update the second variable in 5 by one (to B). 
. Has this variable now been updated 40 times? If yes, go to 
instruction 10. If no, go back to 5, which now reads: Multiply 
A by B. 
10. After the second variable has been updated the proper num- 
ber of times, reinitialize it to A. 

11. Update the first variable in 5 to B. 

12. Has the first variable been updated 40 times? If yes, go to 
instruction 13. If no, go back to 5, which now reads: Multiply 
B by A. 

13. Continue. (Obtain the sums, add 1 to the card count, read the 

next card.) 


OON OAR WH н 


590 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


This type of modification involves five phases: initialization 
(steps 1, 2, 3, and 4), execution (steps 5 and 6), address modification 
(steps 7, 8, and 11), testing (steps 9 and 12), and reinitialization 
(step 10). Only the execution phase is really pay dirt. The computer 
will go through steps 5 and 6 a total of 1600 times, but it will also be 
going through the modification and testing steps many, many times. 

While this method is not at all economical of time, it is very 
economical of space. Nearly the entire storage area can be devoted 
to cross-products. The instructions that the programmer needs to 
write are relatively few. A straight line program requiring no 
address modification would be much faster, but it would use up 
several times as much space as that devoted to cross-products (twice 
as much on the 1620, four times as much on the 650). 


Method 2. Single Row Product 


Because word length on the 1620 is unlimited, and because the 
entire memory serves as an accumulator, it is possible to form an 
entire row of products at one time, If there are 40 variables, and if 
the accumulative cross-products will be 10 digits each, a 400 digit 
word should be formed consisting of the 40 scores each imbedded in 
a 10 digit number, the number containing as many zeros to the left 
as necessary. This 400 digit word is then multiplied by the first 
Score, and then cumulated to the appropriate 400 digit accumulative 
Tow. 

The advantage of this method is that only 120 straight line in- 
structions are needed to obtain all the products of a 40 by 40 matrix 
(Clear, multiply, cumulate, each 40 times). If address modification 
is used, the scheme would be simpler than in Method 1, but of course 
more time would be required than the straight line method. 

The disadvantage of this method is that the multiplication of 
large numbers is a slow procedure on the 1620. Because of the pe- 
culiar way the 1620 multiplies, to multiply by zero is just as time 
consuming as to multiply by 9. On the other hand, this method 


deserves consideration for faster variable word length computers 
than the 1620. 


Method 3. The Matrix Product Method 


It is possible on a desk calculating machine to form two digits, 


product is 6401440081. The product, of course, consists of the 
Squares of 8 and 9 and twice their cross-produet. The method of 
Squaring cannot be usefully extended to more than two variables. 


NATHAN JASPEN 591 
like 8 and 9, into a single word, 80009, and then square it. The 


| 


$ 


On the other hand, consider the product of 80009 by 809. The 
product is 64727281. Here we have the entire cross-product matrix: 
A, AB, AB, and B?, This method can be extended to any number of 
variables, For instance, on a desk calculating machine, multiply 
1002003 by 123. The product is 123246369, the entire three by three 
eross-product matrix of 1, 2, and 3 considered as single-digit varia- 
bles. If we had a larger desk calculating machine, we would be able 
to multiply 7000008000009 by 70809 to obtain 495663566472637281, 
the cross-product matrix of the nine two-digit products of 7, 8, 
and 9. 

To form the two factors, there are two rules. Since ten-digit 
products are anticipated, one factor, or word, has each variable 
imbedded in a ten-digit number. The second factor may be formed 
by imbedding each score in a number of length VL, where V is the 
number of variables and L is the length of each product. 

Now let us obtain the cross-product of our 40 variables. Assume 
that we will allow 10-digit cumulative cross-produets. Factor 1 will 
consist of a 400 digit word, each variable being imbedded in a 10 
digit number. Factor 2 will consist of a word of 16000 digits, each 
of the 40 variables being imbedded in 400 digit numbers. The 
Product of the 400 digit number by the 16000 digit number will be 
16400 digits, of which the first 400 will all be zeros. The remaining 
16000 digits will be the 1600 cross-products, each of 10 digits. 

After the two factors are formed, the program consists of one 
instruction: 


Multiply factor 1 by factor 2. 


The usual product area (storage positions 80-99) should be 
voided since this area clears before every multiplication. 

The advantage of this method is the economy of instructions. The 
disadvantage is the tremendous amount of time required to form 
the product of large numbers. This method is deserving of considera- 
tion for a speedier computer, and one that can multiply very quickly 
by zero; but as far as the 1620 is concerned it must be limited to 


. Small matrices, 


592 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Method 4. Program Generating Method 


In order to have a fast program it is necessary to eliminate 
address modification. A straight line program handling small prod- 
ucts in any one step is the fastest possible method for the 1620. The 
program would be written as follows: 


1. Multiply A by A. 

2. Add the resulting product to the cumulative product AA. 
3. Multiply A by B. 

4. Add the resulting product to the cumulative product AB. 
5. Multiply A by C. 

6. Ete. 


At the appropriate point the first variable would change from 
A to B, and the second variable would be reinitialized to A. The 
Straight line program is all muscle. 

However, every size matrix would require its own program. It 
would be very inconvenient to have to store so many programs. 
Therefore, a program has been prepared which automatically gen- 
erates this on-line program. It is necessary only to read in a parame- 
ter card which specifies the size of the matrix. This card precedes 
the detail cards. The parameter card causes the generation, in mem- 
ory, of a straight line program of the required length. In one recent 
test, 100 detail cards each of 20 variables cumulated into the 20 by 
20 cross-product matrix in a total of 80 seconds. This is many times 
faster than any 650 method. 

The advantage of this method is speed. The disadvantage is 
space. Each cross-product requires two instructions. The limit of 
the present method is 30 variables for a 40000 digit computer. 

This limit will soon be expanded to 40 variables, by eliminating 
the cells below the diagonal, This should also almost double the 


speed. The output matrix will be a complete square, despite the fact 
that only half the matrix is computed, 


Other Methods 
Although Method 1, Address Modification, is slow compared to 
Method 4, the Program Generating Method, Method 1 can handle 
substantially larger matrices. For a 40000 digit computer, the larg- 


est matrix that can be handled on one pass is probably of the order 
of 80 by 80. 


NATHAN JASPEN 593 


A mulü-pass method needs to be developed that will handle 
substantially larger matrices. The “best” method will represent 
some compromise in the struggle between time and space. 

A program also needs to be developed that will handle incomplete 
cases. This program should probably be combined with one that 
handles rectangular matrices. Such a program would be much faster 
than the corresponding 650 program. 


EDUCATIONAL AND Psvcwotoarcat Measurement 
Vou. XXII, No. 3, 1 


SELF-SCORING ITEM ANALYSIS PROCEDURE FOR THE 
IBM 1620 


NATHAN JASPEN* 
New York University 


A single-pass item analysis program for use with the IBM 1620 
computer is described below. This program handles up to 300 items 
for up to 9999 examinees, and functions as follows: 


Input 1. Keys 


The correct answers to a multiple-choice test are punched into 
a set of IBM cards, according to the following layout: 


Test Code Columns 1-3 
“Scoring Key” Columns 4-14 
Card Number Column 20 
Correct Responses Columns 21-80 


For a maximum of 300 five-choice items, five cards are re- 
quired. In columns 4-14 of each card are punched the words 
“Scoring Key.” Items 1-60 are punched in columns 21-80 of card 
1, items 61-120 are punched in columns 21-80 of card 2, and so 
on. Since standard IBM answer sheets have 5 columns of 30 
items each on a side, it appeared convenient to allow 60 items, 
or two whole answer sheet columns, per card. 

If the number of items in the test is not an exact multiple of 
60, the remaining columns in the scoring key should be left 
blank. 


س 
Grateful acknowledgment is made to Mildred E. Katzell for her many fine‏ 1 
Suggestions regarding this paper.‏ 


595 


56 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Input 2. Examinee Responses 


The responses on each answer sheet are punched into one or 
more cards, according to the following layout: 


'Test Code Columns 1-3 
Identification Columns 4-16 
Score Columns 17-19 
Card Number Column 20 
Responses Columns 21-80 


The score field (17-19) is optional, is ordinarily not necessary, 
and may be left blank, 

If the number of items in the test is not an exact multiple of 
60, the unused response columns should be left blank. 

It is not necessary to sort the answer sheets or pre-arrange 
them in any way. The cards for each individual should be 
together in card number order (column 20), but the individuals 
may be in any order. There must be as many cards for each 
individual as there are Scoring keys, that is, one card for each 60 
items up to a maximum of five cards for 300 items. The indi- 
vidual responses are distinguished from the scoring keys by the 
absence of the words "Scoring Key" in columns 4-14. The 
maximum number of individuals is 9999. 


The output, in the ordinary mode of operation of the program, 


consists of a lead card followed by one card for each item in the test, 
up to 300 items. The lead card contains the following: 

Test Code 

Number of Items 

Number of Individuals 

Mean 

Standard Deviation 


Each item сага contains the following: 


Test Code 

Item Number 

Correct Answer 

Pı, the proportion of indi 


viduals who selected Alternative 1 
fı, the biserial r for Alte 


mative 1 with Total Score 


NATHAN JASPEN 597 


Mı, the mean total score of the individuals who selected 
Alternative 1 

Ps, r2, Ma for Alternative 2 

Ps, rs, Мз for Alternative 3 

Ps, r4, M4 for Alternative 4 

Ps, rs, Ms for Alternative 5 

Po, To, М, for the omits 

Р», te, Me, а repetition of the P, r, and M for the keyed 

answer 


The output may then be listed on the IBM 407. A sample print 
out is shown in the illustration. 


The following alternatives may be elected: 


Set up 1. Eliminate the scoring, and use the score punched in 
columns 17-19 of card 1. This score may be any number, not 
necessarily a function of the items. This permits the use of a 
continuous external criterion, 

Set up 2. Punch out the test scores for each individual. A rights 
Score, à wrongs score, and a combined score (rights minus 
one-third or one-fourth wrongs) are punched out for each 
individual. This permits use of the program as a scoring pro- 
gram, In its ordinary mode of operation (set ups 1 and 2 off) 
the program is self-scoring, but the score is not punched out. 

Set up 3. Eliminate the item analysis (for use if the program is 
to be used exclusively for scoring). 


Figure 1 


—— Bü '/——  — 


TEST ITEM KEY 


MED SURG PART A 15/180 1 


Mr SD; Nr DATE 
35.65 4.91 786 9/25/61 


Alternative p r m 
1 ‚684 .36 037 
2 160  —.23 034 
3 .023  —.18 034 
4 123 —.24 034 


.000 —.46 029 


E: 
8 
8 


508 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Set up 4. Compute point biserial instead of biserial. 


In addition to these set ups, the scoring keys may be manipulate 
to eliminate items. A zero punched for a response in the scoring k 
eliminates the item both from the total score of the individual а! 
from the item analysis. This is useful if item analyses are desir ‹ 
for separate parts of a test. The parts need not consist of consectle 
tive items. A 9 punch in the scoring key eliminates the item from th 
total score, but not from the item analyses. This is useful if iti 
discovered that an item is ambiguous, but a distribution on the it п 
is still desired. 

In addition to the card output, there is also typewriter output: 
count of the total number of items in the test, a count of the number 
of items included in the score, and a count of the number of item 
included in the analysis but not the score. 

The speed of the program is approximately 60 cards of 60 items 
each per minute. Punchout speed is about 80 item cards per minute 
A complete program for а 180-item test for 600 examinees requires 
approximately 40 minutes of computer time, plus, perhaps, 15 min 
utes of tabulator print-out time, and should cost no more than $50 à 
regular commercial rates. The key-punching and verifying costs 
for an analysis of that size would probably be close to $200. Th 
lime figures cited here are quite different from those cited by Adan 
(1960) for the 650. 

The advantages of this method are the complete control over 
accuracy of computation, and the speed of operation after the card 
have been keypunched. 

The program described above represents a modification of à 
ja Program prepared by the writer for the IBM 650 (Jaspely 


REFERENCES 


Adams, James F. “The Use of the Electronic Computer for I em 


alysis.” EDUCATION, © 
(1960), 611-613. AL AND PSYCHOLOGICAL MEASUREMENT, ; 


Jaspen, Nathan “Some Applicati i 
J pplications of Electronic Computers ti 
Kis and Measurements.” Paper presented at the Eastern PSY 
chological Association, Philadelphia, April 11, 1958. 


Kovcariomat акр PsvcmotoorcAL MEASUREMENT 
Vou. XXII No, 3, 1962 


A FORTRAN PROGRAM FOR PROPORTION OF 
VARIANCE IN MULTIPLE REGRESSION 


EDWARD LEVONIAN axo STANLEY AZEN 
University of California, Los Angeles 


Ix multiple linear régression analysis, interest often focuses on 
the proportion of variance in the dependent variable accounted for 
by each of the independent variables. This proportion of variance 
is represented by the product gr, where 8 is the standard partial 
regression coefficient associated with a given independent variable, 
and r is the correlation between that independent variable and the 
dependent variable. 

The primary problem involves the computation of the beta co- 
efficients. In the present program these are computed by pre-multi- 
plying (a) the column vector whose elements are the correlations 
between each independent variable and the dependent variable by 
(b) the matrix inverse to the matrix of correlations between the in- 
dependent variables, : 

Purpose. The purpose of this program is to compute the propor- 
tion of variance in the dependent variable accounted for by m inde- 
Pendent variables in a problem for which the inverse matrix of cor- 
Telations between the independent variables is available. The 
Program finds its greatest usefulness in a problem which involves the 
Same m independent variables in n analyses involving » dependent 
variables, 

Program. The program involves four matrices: A, B, C, and D. 
The (m x т) A matrix is the inverse of the matrix of correlations 
between the independent variables. The (m X n) B matrix contains 
the correlations between the m independent and n dependent vari- 
ables. The A and B matrices are considered by the program as data, , 


599 


600 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


and they must be supplied by the user; these matrices are not com- 
puted by this program from raw data. 

The (m X n) C matrix contains the beta coefficients, and is re- 
lated to the A and B matrices by the following: 


m 
Gin Р абы, 
k=1 


where a, b, and c are elements of the A, B, and C matrices. The сз 
in the first column represent the beta coefficients associated with 
the first dependent variable, ete. 

The (m X n) D matrix contains the Br products. Each column rep- 
resents the dot product of corresponding column vectors in the B 
and C matrices. That is, 


dii = bc; for 121,2-m; 1=1,2...2» 


Thus, the m elements in the first column of the D matrix represent - 
the proportion of variance in the first dependent variable accounted 
for in terms of the m independent variables. The sum of this column 
is R?. These m values, R?, and R are listed row-wise in that order 
under the heading “COLUMN 1." These values for the second de- _ 
pendent variable are then listed under the heading “COLUMN 2,” 
eto. 

Limitations. 'The number of independent variables, m, cannot ex- 
ceed 50, and the number of dependent variables, n, cannot exceed 
200. The routine can be used on any machine which has a FORTRAN 
compiler and a core memory of 32K, such as the IBM 704, 709, or 
7090. Input is from logical tape 5, output to logical tape 6. Comput- 
ing time (exclusive of listing time) on the IBM 7090 is about 3 


minutes for a problem involving 50 independent variables and 200 
dependent variables, 


Card Preparation, 


a. Problem Card 
Columns 1-3 m (right-adjusted) 
Columns 4-6 n (right-adjusted) 
b. Data Format Card 
Each entry of matrix A and B must follow the same format. 
This format is punched on the Data Format Card, beginning 
with column 1, and must be completed by column 72. Thus, 


Í 


e 


LEVONIAN AND AZEN 601 


(12F6.2/5F6.2) punched in the first 14 columns of the Data 
Format Card indicates that: (1) there are two data cards for 
each column of matrix A and B, (2) the first entry on each 
card begins in column 1, (3) each entry occupies a six-digit 
field with the integer portion of the entry occupying 4 col- 
umns, and (4) m — 17. 


. Data Cards for Matrix A 


The entries of matrix A are punched by columns as speci- 
fied by the Data Format Card. Thus, the first card (or group 
of cards) represents ау @21 . . . Amı, the second card (or group 
of cards) represents di» da2 . . . Am2, ete. The decimal point 
need not be punched. 


. Data Cards for Matrix B 


The entries of matrix B are also punched by columns. Thus, 
the first card (or group of cards) represents bj; Баз... Omi, 
the second card (or group of cards) represents bı2 b22 . . . bma, 
ete. 


Order of Cards. 


-. System and Program Cards 
a. Problem Card 

b. Data Format Card 

€. Data Cards for Matrix A 
d. Data Cards for Matrix В 


Availability. Copies of the program may be obtained from the 
senior author. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


A FORTRAN PROGRAM FOR MULTIPLE 
REGRESSION SCORES 


EDWARD LEVONIAN лхо RAYMOND GREGORY 
University of California, Los Angeles 


A multiple regression equation is often used to compute a given 
subject’s score for the dependent variable. The procedure involves 
multiplying the regression coefficient associated with a given inde- 
pendent variable by the subject’s score on that variable, summing 
these products, then adding a constant. 

Purpose. The purpose of this program is to perform the operations 
described in the paragraph above. 

Program. The program computes 


Cor »» AX 


where C is the constant 
A, is the regression coefficient for the jth variable 


j= 1,2... т 
Xj is the ith subject’s score on the jth variable 


121,2: N 

The value of the expression above is computed for each subject, 
and the results are written on logical tapes 6 and 10. The listing is . 
obtained from tape 6, and punched cards, if desired, are obtained 
from tape 10. 

A typical usage of this routine might involve a comparison of an 
empirical value (Y VALUE) with an estimated value (Y ESTI- 
MATE) based on a multiple regression equation, both values of Y 
Pertaining to the ith subject. In fact, the routine was written with 


604 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


this application in mind, and this is reflected in the form of the out- 
put. 

The output lists in order (a) the sample size, N , (b) the number 
of independent variables, m, (c) the variable number and iis asso- 
ciated regression coefficient, A;, for each of m variables, (d) the con- 
Stant, C, followed by (e) the subject number, Y VALUE, and Y 
ESTIMATE for each of N subjects. The punched card output, if 
requested, contains only (e), with one card per subject. 

Limitations. The number of independent variables, m, cannot ex- 
ceed 50, and the number of persons, N, cannot exceed 500. The 
routine can be used on any machine which has a FORTRAN com- 
piler and a core memory of 32K, such as the IBM 704, 709, or 7090. 
Input is from logical tape 5, output to logical tapes 6 and 10, Com- 
puting time (exclusive of listing or card punch time) on the IBM 
7090 is about 10+ seconds/variable/person. This amounts to about 
2.5 minutes for a problem involving 50 variables and 500 subjects. 


Card Preparation, 


a. Problem Card 
There must be one Problem Card containing the following: 


Columns 1-3 m (right-adjusted) 
Columns 4-6 N (right-adjusted) 


b. Parameter Format Card 


There must be only one Parameter Format Card. This card 
specifies the format of the input parameters: the A js and C. 
There are two restrictions on the parameter format: (1) the 
Aj must be followed by C, albeit zero, and (2) the parameters 
must be punched in floating point, 

For example, (2F5.2/F5.2,F4.0) punched in the first 17 
columns of the Parameter Format Card is appropriate for the 
case in which (1) m = 3; (2) each A, occupies 5 columns, 
with the last two columns for the fractional part, while C 
and is an integer; (3) there are two 
: › With the first Parameter Card containing 
2 Аув and the second Parameter Card containing the third A; 
and С; and (4) the first A; is in column 1-5 and the second 
A, in column 6-10 of the first Parameter Card, while the third 


LEVONIAN AND GREGORY 605 


A; is in column 1-5 and C in column 6-9 of the second 
Parameter Card. 
c. Parameter Card (s) 

There is no restriction on the number of Parameter Card(s). 
This card(s) contains the m coefficients and the constant in 
the format specified by the Parameter Format Card. 

d. Data Format Cards 

There must be two Data Format Cards, even though the 
second may be blank. There are three restrictions on the data 
format: (1) the data, the X,’s, pertaining to a given i must 
begin a new card; (2) the data must be punched in floating 
point; and (3) the data for each ? must be followed by the Y 
VALUE for that i. If there is no interest in the Y VALUE, 
zero punches or blank columns may be used, but these must 
be anticipated by the Data Format Card. 

For example, (6X,3F4.1,F2.0) punched in the first 15 col- 
umns of the first Data Format Card, and with the second 
Data Format Card blank, is appropriate for the case in which 
(1) m = 3; (2) each Xy occupies 4 columns, with the last 
column for the fractional part; (3) Y VALUE is a 2-place 
integer; and (4) there is one card for each i, with the first Xy 
in column 7-10, the second Xy in column 11-14, the third Xy 
in column 15-18, and Y VALUE in column 19-20. 

е. Data Cards 

There is no restriction on the number of Data Cards. These 
cards contain the (m by N) Xs in the format specified by 
the Data Format Cards. 


Order of Cards. 


- System and Program Cards 

. Problem Card 

. Parameter Format Card 

. Parameter Card (s) 

. Data Format Cards 

. Data Cards 

- Repetition of a-e for problem 2,3... , if desired 


— 
о б. о ср : 


Availability. Copies of the program may be obtained from the 
senior author. 


Ve 


ESECATIONAL амо PSYCHOLOGICAL MEASUREMENT 
1962 


ХХІ, No. 3, 


ІВМ 650 RECTANGULAR CORRELATION MATRIX 
PROGRAM FOR INCOMPLETE CASES 


NATHAN JASPEN 
New York University 


Tuts paper describes a special correlation program for the IBM 
650 computer which has several novel features. 


jb 


Ease of operation. The program requires a single-pass op- 
eration on an 80-80 control panel, using a single card for each 
subject. Up to 30 variables may be punched in the card. Of 
these 30 variables, one variable (the first) may be up to 4 
digits in length, 15 variables may be up to 3 digits in length, 
and the remaining 14 variables are limited to 2 digits. The 
card columns are as follows: 


Description. Number of Columns Card Columns 
Individual Ident. Number 3 1-3 
Variable 1 4 47 
Variables 2-16 3 each 8-52 
Variables 17-30 2 each 53-80 


If fewer than 80 columns are punched, it is not necessary to 
zero out the card, but it is necessary to zero out the word or 
decade. For instance, if only 10 variables are punched, this 
takes us to column 34, and columns 35-40 must be zeroed out, 
either in the card, or by modification of the control panel. 

All scores must be positive or zero. No negative scores are 
permitted. 

One of the primary functions of the program is to handle 
incomplete cases, Each correlation coefficient in the matrix 


607 


608 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


may be based on its own individual N. If a score for an indi- 
vidual is missing, then all correlations involving that variable 
have their N's reduced by one. Any number of scores for any 
number of individuals may be missing. 

A missing score is indicated by 99, 999, or 9999, depending 
on whether it is punched in a 2, 8, or 4 digit field. The number 
99 is therefore not a legitimate score. If the range of test 
scores for a variable extends to 99, a 3 digit field may be used, 
in which case a score of 99 would be punched 099, and a miss- 
ing score would be punched 999, 

The program will produce a rectangular correlation matrix, 
not exceeding 240 correlation coefficients, of any set of desig- 
nated variables against any set of designated variables. If all 
30 variables are punched in the data cards, and if it is de- 
sired to obtain the correlations of variables 10-20 against 
variables 10-20, a square matrix of 121 correlations will re- 
sult; but, just as easily, variables 1, 2, 3, 4, and 5 can be cor- 
related against variables 2, 4, 6, 8, 10, 12, 14, and 16. Also, 
variables 30, 10, and 20 can be correlated against variables 
30, 20, 10, and 25. The variables may be in any order. It is 
not required to repunch the data cards. 

In order to designate the row and column, two indicator 


cards are punched. The layout of the indicator cards is as 
follows: 


Description Number of Columns Card Columns 
Study Ident, Number 3 1-3 
Zeros 16 4-19 
Card Number (1 or 2) 1 20 
Identification of Variables 2 each 21-80 


These two cards must; be completely punched in all 80 col- 
umns, with zeros if necessary to complete the card. Each vari- 
able must be expressed a8 a 2-digit number. If there are fewer 


ke 80 variation be code 00 signifies the end of the vari- 
ables, 


The limiting square m 
a single-pass is 15 b 
30 by 8, or 24 by 10, 
cards are entirely 


atrix available from this program on 
y 15. Limiting rectangular matrices are 
or 20 by 12, and so on. Since the control 
under the operator’s control, a 30 by 30 


NATHAN JASPEN 609 


matrix can be partitioned in any number of ways simply by 
punching control cards. 

4. There are four outputs, which ordinarily punch out one 
after the other without operator intervention. The operator 
may intervene, however, to suppress any of them, since each 
output is controlled by a separate trailer card. All the outputs 
are wired 80-80. The four outputs are as follows: 

a. For each variable designated in control card 1, against 
each variable designated in control card 2, an output 1 
eard is punched, up to 240 cards, as follows: 


Description Card Columns 
Study Identification Number 1-3 
Output Identification Number 10 
Variable X 15-16 
Variable Y 19-20 

N 21-30 
Sum X 31-40 
Sum Y 41-50 
Sum X Square 51-60 
Sum Y Square 61-70 
Sum XY 71-80 
The limitation ор N is that neither ХХ? nor EY? may 
exceed 10 digits. 


b. Corresponding to each card in output 1, an output 2 card 
is punched, as follows: 


Description Card Columns 
Study Identification Number 1-3 
Output Identification Number 10 
Variable X 15-16 
Variable Y 19-20 

N 21-30 
Mean X (4 decimals) 31-40 
Mean Y (4 decimals) 41-50 


Standard Deviation X (4 decimals) 51-60 
Standard Deviation Y (4 decimals) 61-70 
r (3 decimals) 71-80 


It has been found that it is necessary to calculate the 


610 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


5. 


8. 


standard deviations accurately to 4 decimals, in order 
that the r’s be accurate to З decimals. 

в. The third output consists of the r’s in matrix form. If 
the matrix is less than 8 variables wide, each row will be 
punched out as a separate card. If the matrix is more 
than 7 variables wide, each row will be partitioned into 
segments of 7 variables each. The segments, each 7 vari- 
ables wide, can be laid alongside each other after being 
printed. The layout is as follows: 


Description Card Columns 
Study Identification Number 1-3 

Row of Leading Correlation 5-6 
Column of Leading Correlation 9-10 

7 Correlations 11-80 


d. The fourth output is identical to the third output, except 
that the N’s pertaining to each cell are punched out in 
matrix form instead of the r’s. 


In order to conserve drum storage, the program is divided 
into three parts. The first subprogram controls the reading 
in of the two control cards. The second subprogram controls 
the accumulation of sums and cross-products. The third sub- 
program calculates the correlations and punches them out. 
The two control cards are inserted between the first and sec- 
ond parts, and the data cards are inserted between the second 
and third parts. The assembled deck is then fed through the 
computer in a single pass, without further operator inter- 
vention, 

The program calculates to high arithmetic precision. The 
Sequence of calculations is such that highly accurate results 
are obtained whether samples are small or large. 

The price paid for this program is loss of speed. The number 
of cards processed per hour is approximately 18,000 divided 
by XY, where X is the number of variables specified in con- 
trol card 1, and Y is the number of variables specified in con- 
de card 2. A 10 by 18 matrix for 100 cases would take about 

our. 


This program will handle complete cases as well as incom- 


-—— c ———— —— —--— -————— cQ————— ————— А. 


plete cases, squa 
rices. If, however, man co 
volved, the programs buil 
cases will be found to be more 


UTILIZING CLASSROOM DATA BASED UPON 
UNLIMITED CHOICES OBTAINED UNDER A 
SINGLE SOCIOMETRIC CRITERION 


ROMOLO TOIGO 


Rip Van Winkle Foundation 
Hudson, New York 


Background 


IN the investigation of correlates of sociometrie choice, it is con- 
venient to proceed in terms of the choice relationships which are 
revealed by the sociometric test with respect to each distinct pair 
of children. For each pair there are three such possibilities—a 
mutual choice, a one-way choice, or no choice, Each of these possi- 
bilities can be assigned a value which indicates the degree of re- 
ciprocation of friendship between the two children. With such infor- 
mation on each pair, it is then a straight-forward matter to examine 
the extent to which levels of choice are related to correspondence in 
objective or subjective characteristics between the individual pairs 
of children, Additionally, it is possible to determine the relationship 
between a chooser’s characteristics and the aggregate characteristics 
of all others whom he chooses. Thus, questions can be answered 

| relating to the consistency with which a child with given social and 
Personal characteristics chooses children of other specified social 
and personal characteristics, 
| In a large-scale ongoing investigation of psychological and socio- 
Bical correlates of aggressive behavior among third-grade children 
conducted оп a county-wide basis (Eron, 1960), information was 
Tequired Concerning the relationship between a child’s own level of 


613 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Vor. XXII, No. 3, 1962 
AN IBM 650 COMPUTER PROGRAM FOR THE 
EVALUATION OF A RECIPROCATION INDEX 


614 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT uf 


aggression, his choice of friends on a sociometric test, and the le 

of aggression of his friends. Data had been obtained in each of the 
third-grade classrooms in the county both with respect to aggression 
items and with respect to a sociometric friendship criterion. Fur- 
ther processing of this information (initially obtained through 
utilization of a modified “guess-who” technique in each of thirty- 
eight third grade classrooms) resulted in a vector of choices on the 
friendship item for each child (Walder, Greene & Lefkowitz, 1962). 
The program presently under description utilized the choice vectors 
as input, providing an output which indicated the nature of the 


choice linkage that existed between each distinct pair of children 
in each class, 


1 Description of Program 
Input and Capacity 0 


For each child, input to the program consisted of a vector of zeros 
and ones together with identifying information. This vector indi- 


cated the child's choices of other children in the class with respect - 


to the sociometric criterion. Each element of the vector corresponded 
to a given child in the class. If a choice had been made, the element 


equaled one; otherwise, it was zero. Unlimited choices were allowed, ' 


as well as no choice, 


The distinction between boys and girls was maintained through 
use of distinctive identification numbers, Thus, boys had serial num- 
bers extending from one to thirty, while girls had serial numbers 
extending from thirty-one to sixty. The maximum capacity of the 
program per class was therefore sixty; thirty boys and thirty girls. 
The choice vector for an absent child 
3's. This permitted identification in the output of pairs in which one 
or more of the children were absent, 


Output 
One output card was punched for each distinct pair of children. 
This card indicated 


E dicated (a) the serial numbers of the pair of children, 
(b) the choices in each direction between the two children, and (¢) 
the reciprocation in 


choice occurred in 


а either direction, one if there was at least one 
choice between 


the pair, and two if each child chose the other. A 


was replaced with à vector of | 


dex for the pair. The index equaled zero if no | 


ROMOLO TOIGO 615 


score above two indicated that one of the pair was absent when the 
sociometric instrument was administered, 


Phases of Program 


The program! in absolute language for the basic 650 contained 
. 364 instructions, and consisted of two phases. In the first, the choice 
vectors for all children in the class were read in. In the second, the 
resulting matrix of choices stored within the computer was sys- 
tematically scanned in order to determine the choices between each 
distinct pair of children, 

Processing time for a class of thirty children was approximately 
six minutes. The program was self-restoring so that the classes could 
be processed sequentially without operator intervention. 


REFERENCES 


Eron, L. D. “Psychosocial Development of Aggressive Behavior.” 
Progress Report: Project M 1726, United States Public Health 
Service, 1960 (mimeographed). 

Walder, L. O., Greene, H. E., and Lefkowitz, D. D. “A Method for 
Deriving “Flexible” Sociomatrices from Response Forms Ap- 
propriate to Children in the Third Grade.” Educational and Psy- 
chological M. easurement, XXII (1962), 187—191. 


* Listing available from the writer. 


EDUCATIONAL AND PsYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


A COMPUTER COURSE FOR THE 
BEHAVIORAL SCIENTIST 


FRANK B. BAKER 


Laboratory of Experimental Design 
University of Wisconsin 


ALTHOUGH the majority of behavioral scientists are aware of the 
computational abilities of digital computers, the proportion of such 
persons who involve computers in their research is rather small, The 
highly technical vocabulary, the apparent need for a high level of 
mathematical sophistication, the high cost, and a reluctance to 
Venture into such a formidable appearing field have all served to 
deter the full use of computers by the behavioral scientist. In addi- 
tion, a rather wide communications gap has existed between the 
computer specialist and the behavioral scientist. The latter is deeply 
involved with his own research and only casually acquainted with 
Computers, The former, while highly skilled with computers, lacks 
familiarity with behavioral research. In order to bridge this gap, 
the author has developed a one semester computer course designed 
Specifically to meet the needs of the behavioral scientist. The ob- 
jectives of the course are to provide the student with a working 
knowledge of: (a) the specialized terminology used by computer 
Specialists; (b) standard statistical library routines; (c) the char- 
acteristics of programming; and (d) the role of computers in re- 
Search. Working knowledge of these four areas should enable the 
behavioral scientist to make the digital computer an integral part 
of his research and to stimulate his interest in further computer 
Applications. The course, open to faculty and graduate students, 
Presumes the student is familiar with statistical analysis and is en- 
Баред in research, 


617 


618 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The typical introductory programming course deals primarily 
with the mathematics of computers and the details of machine func- 
tions. Topics such as Boolean algebra, the algorithims underlying 
machine functions, and the instruction repertoire of a real or im- 
aginary computer are taught. A number of mathematical sub- 
routines are usually programmed for didactic purposes. In such 
courses the student has had little or no contact with library routines 
and with the computer hardware. It is precisely such courses which 
have left persons from the behavioral sciences bewildered and dis- 
couraged in regard to computer programming and to computer- 
orientated research. Although such courses are very valuable to 
persons who will eventually do extensive programming, they do not 
meet the needs of the research worker in the behavioral sciences. 
The research worker's initial concern is with the use of the computer 
to perform the analyses required by his problem, not with the in- 
timate details of how the analysis was programmed. 

The computer course discussed below can best be described as a 
“backwards” computer course. The initial sections of the course are 
devoted to developing the students’ self-confidence through actual 
use of the statistical library routines, Then he is taken “backwards” 
to the fundamentals of computer programming. Such an approach 
permits the student to maximize his use of computers with a mini- 
шша investment. Ап incidental benefit to the computing center is 
the inclusion of funds for computer time in the researcher’s next 
budget request, 

The computer course for the behavioral scientist has been taught 
by the author at the University of Minnesota during 1959-60, at 
the University of Maryland in the summer of 1961 as a faculty 
Seminar, and presently at the University of Wisconsin. The enroll- 


ment in the course has included faculty and graduate students from 
areas such as education, psychology, 


and the department of taxation, The 
have made extensive use of computers in their subsequent research 
projects. Thus the aims of the course have been fulfilled. 


к outline of the course is presented below, followed by an ampli- 
ation of the details and rationale for each section of the course. 


Outline of the Course 
I. Introduction to Digital Computers 


economics, sociology, medicine, 
graduates of the initial course 


fic 


FRANK B. BAKER 619 


А. Internal organization of computers 
B. Storage devices 
C. Input-output devices 
D. Number systems 
П. Use of Computing Center Library Routines 
A. Punched сага formats 
B. Statistical program write-ups 
C. Example problems 
III. Fundamentals of Programming 
A. Flow charts 
B. Machine repertoire 
C. Example problems 
IV. Computer Applications 
V. Higher Level Languages (FORTRAN) 
VI. Review and Test 


In the introductory section, the internal organization of a digital 
computer is related to a clerk working with a desk caleulator, an 
experience common to all members of the class, Such an approach 
does much to dispel the students' concept of the computer as a “black 
box." The various types of storage devices used, the input-output 
devices, and the physical hardware of computers are discussed with 
the purpose of not only understanding the device, but also develop- 
ing the student's specialized vocabulary, a task only too often dele- 
gated to incidental learning, Whenever possible, the actual devices 
are inspected to relate the classroom lectures and the hardware. The 
relationships between binary, octal, and decimal number systems are 
Studied, and the students practice entering octal numbers into the 
computer console. Such practice serves the two purposes of develop- 
ing familiarity with (a) the number system and (b) the computer 
console. Although it seems trivial to experienced machine operators, 
the first session at a computer console can be a rather traumatic 
experience for some students. 

After the student has become conversant with computer terminol- 
ogy, the study of write-ups for computing center statistical programs 
is initiated, Typically a large proportion of the class has never 
designed a punched card format; therefore, it is necessary to devote 
4 lecture or two to the techniques of laying out a punched card and 
to the mechanics of key punching cards. Initially, the program 


620 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


write-ups are very difficult for the student to understand (it seems 
as if only the person who programmed the routine could ever read 
the write-up.)! 

Each write-up is discussed in class to determine: the manner in 
which the card formats are specified; the restrictions placed upon 
the data; the parameters or control numbers needed; the nature of 
the output; and the cost per analysis. The student is then required 
to select one of the routines discussed and prepare input cards using 
some of his own data, preferably from one of his past or present 
research projects. In the past it has been the responsibility of each 
student to operate the computer when analyzing his data; however, 
a recent change to the nefarious “closed shop” system has eliminated 
this learning experience. At the present time the students submit 
program decks and pick up their output listings later in the week. 
The output listings are then related to the write-up for the particu- - 
lar program. At this point in the course, the students have obtained 
a firsthand knowledge of the use of standard library routines avail- 
able in the computing center. Many students obtain program write- 
ups for analyses needed in their current research and proceed inde- 
pendently to use computer time. 

The third section of the course is a step “backwards” to the 
fundamentals of programming. The flow charts for one of the 
statistical routines previously discussed are used to teach the tech- 
niques of fractionating a problem into programmable elements. 
After а discussion of flow chart techniques, a number of machine 
instructions are taught and a simple program written. Since each ' 
student is given an Opportunity to de-bug his routine, the errors 
detected serve as the basis for class discussion of de-bugging 
Procedures, In this section of the course, the student obtains enough 
Programming experience to develop an understanding of the nature | 
of Programming but not enough to overwhelm him. In many cases 
it is as important for the research worker to be able to estimate the 
amount of programming necessary as it is for him to be able to write 
programs. 

Airis: ba of the course, several weeks are devoted to open | 
е application of digital computers to areas in which | 
E z glaring example of poor documen: 


И outine published by a coi 
the input addresses of the E A rada 


tation is a bivariate Lagrangian intel 
ter users group which failed to mention 
rguments! 


FRANK B. BAKER 621 


class members have interests. The discussions have had a wide 
range, covering topies such as: generalized statistical techniques, 
arbitration of labor disputes by the computer, use of computers as 
teaching machines, simulation of biological and psychological sys- 
tems, and the capabilities of future computers. One student has 
proposed writing a computer program to simulate the behavior of a 
rat in a Skinner box—a proposal which could lead to widespread 
unemployment of psychologists because of automation! This section 
of the course has proven to be extremely beneficial to both the class 
and the instructor. A great many excellent ideas are put forth, many 
of which warrant serious consideration. 

The student is introduced to higher level languages (FORTRAN) 
in the final weeks of the course. At this stage the student has a 
minimum of difficulty understanding the basics of (FORTRAN) 
and can follow a good instruction manual with a fair amount of 
sophistication. The same problem which was coded in machine 
language is written by the student in FORTRAN and processed by 
the computing center. 

The computer course for the behavioral scientist produces a 
Tesearch worker who, though not skilled as a programmer, can 
effectively use existing computer programs, can make an appraisal 
of his programming needs, and can communicate with the pro- 
fessional programmer, For many students, attainment of these goals 
admirably suits their needs, For others, it serves as a starting point 
for further programming courses and the basis for formulation of 
new avenues of computer-oriented research, 


EbUcArtONAL axb PSYCHOLOGICAL MEASUREMENT 
Mor. XXII, No. 3. 1962 


BOOK REVIEWS 


Edited by 
WILLIAM B. MICHAEL 
CM University of California, Santa Barbara 


Durost and Prescott's Essentials of Measurements for Teach- 


E Gusrgr Вах... 625 
Good's Dictionary of Education (Second Edition). ROBERT A. 
| cr MM ЧИ 626 
| Scheffé’s The Analysis of Variance. Олт E. WILEY ........ 627 
| Goldfarb’s An Introduction to Longitudinal Statistical Analy- 
Bis Jawzs M. RICHARDS; JB: oteo cocci none ШУА, 630 
| Goldberg's Introduction to Difference Equations. Error М. 
мав... 632 
Riordan’s An Introduction to Combinatorial Analysis. ALBERT 
EXON. WHITEMAN ТИ 634 


Sherman’s A Rorschach Reader. HAROLD Вовко 
Buss’ The Psychology of Aggression. MONROE M. Lzrxowmz.. 636 
Smith and Ennis’ Language and Concepts in Education. Hans 


BORG җӘтивх.........5.... ИИ 639 
Ladd and Sayres’ Social Aspects of Education: A Casebook. 
Ө GEORG STERN ....... 9:0]. E PDT ОООО 640 
623 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 3, 1962 


Essentials of Measurement for Teachers by Walter N. Durost and 
George A. Prescott. New York: Harcourt, Brace and World, Inc., 
1962. Pp. 167. 

Durost and Prescott state that the purpose of their text “is to deal 
in a practical way with fundamental issues arising in connection 
with the use of tests by elementary and secondary school teachers.” 
It should be noted at the outset that readers who expect texts de- 
signed especially for teachers to be devoid of content or who expect 
“practical” measurement to be independent of test theory will have 
made an error in their judgment of this text. Durost and Prescott 
have combined both theory and practical applications to form a 
concise and readable text. 

The text contains eleven chapters: 1) The Role of Measurement 
in Instruction; 2) What the Teacher Needs to Know About Stand- 
ardized Tests of Achievement; 3) What the Teacher Needs to Know 
About Standardized Tests of Capacity; 4) How the Teacher Can 
Construct and Use His Own Tests; 5) What the Teacher Needs to 
Know About Norms; 6) How to Compare Measured Capacity and 
Measured Achievement; 7) The Why, When, and How of Grouping 
às an Aid to Instruction; 8) The Problem of Marking; 9) How to 
Tell Parents About Test Results; 10) What Constitutes an Ade- 
quate Testing Program; and 11) Simple Statistical Techniques 
Applied to Test Scores, 

However, any text which covers the content that the present 
authors have attempted is bound to run into some difficulty. The 
Tesult is a book which can but briefly cover many important topics 
in educational measurement, Thus, the authors can afford to spend 

ut one paragraph on personality and interest inventories, less than 
one page on essay examinations, and nothing at all on item analysis. 

On the other hand, perhaps the authors spent an inordinate 
amount of time on such concepts as modal-age equivalents (3 
Pages), techniques for grouping students (11 pages), and the mean- 
Ing, computation, and uses of stanines (13 pages). The reviewer also 

elieves that some of the tables in the appendices could be elimi- 
nated. For example, reciprocals and reciprocals of square roots, 

Products of deviations from 1 to 35 and their squares times numbers 

from 1 to 15, and reciprocals and f times the reciprocals (f from 1 to 


625 


626 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


10) for numbers 25 to 120 could easily have been eliminated. The 
relatively long discussion of how to extract fourth roots from one 
table could also have been eliminated to make room for more useful 
material. 

Considering the size of the text, the authors have done a com- 
mendable job in maintaining accuracy without being overly com- 
plex. However, a few inaccuracies and overly complicated material 
are likely to creep into even the very best manuscripts. For example, 
teachers may experience some difficulty understanding what relia- 
bility means when it is first defined as the obtaining of “an identical 
score on repeated measurements” and later as the “degree of rank- 
order agreement between two sets of scores.” More advanced readers 
may be concerned about such statements as “split-half and Kuder- 
Richardson coefficients are . . . measures of the stability of the 
test . . .” or that intelligence tests may be used “to identify indi- 
viduals who have latent mental capacity,” or that Thorndike’s three 
aspects of intelligence are “level, extent, and spread." 

It seems to the reviewer that the use of the term capacity and the 
subsequent diseussion in the text of under-achievement and over- 
achievement, tends to perpetuate anachronisms in measurement. 
Indeed the authors state that “an overachiever . . . is achieving at à 
level beyond his measured capacity." 

In general, the text might have been greatly improved by covering 
fewer topics but in more detail. However, if the text is used by 
teachers who already have some background in measurement, it may 
well prove to be of value as a general review. 

GILBERT Sax 
University of Hawaii 


oe ey of Education (Second Edition) by Carter V. Good 
E M ru Y dne MeGraw-Hill Book Company, Inc. Pp. 
е second edition of the Dictionar ; 2 
À y of Education prepared under 
the able leadership of Carter Good, with the financial support of Phi 


difficulty of the task of th 
In keeping with the interests of 


of pages indicates that approximately 10 
ted to the field of measurement. 
clude measurement, terms, A Compre- 


BOOK REVIEWS. ` 627 
hensive Dictionary of Psychological and Psycholanalytical Terms 
by English and English, Dictionary of Statistical Terms by Kendall 
and Buckland, and Mathematics Dictionary by James and James, 
were used in evaluating definitions. The definitions studied were 
found to be concise and accurate. One minor objection which might 
be raised in passing is that both Good and English accept “distribu- 
tion-free” and “nonparametric” as synonyms, Kendall and Buck- 
land carefully point out the distinction. 

Continuing the comparison of dictionaries, examples of terms 
defined by English but omitted by Good are “response set” and 
“j.n.d.”, while “quartimax” and “gramian matrix” are defined by 
Good but not English. Examples of terms not defined in either 
dictionary are “semantic differential” and “verbal fluency.” The 
citing of these examples should not obscure the fact that both 
dictionaries have remarkably good coverage of measurement terms. 

An interesting feature of the Dictionary of Education is a two 
page summary of abbreviations and symbols. Most of the symbols 
listed are related to the field of measurement. One might object, 
however, to the listed association of the symbol z with the mode; 
particularly in view of the widespread usage in psychology and edu- 
cation of 2 with standard score and with Fisher's transformation. 

Three other minor criticisms may be made. In comparing terms 
in the four dictionaries, the small (about 6 point) type used in the 
Dictionary of Education became a source of irritation with contin- 
ued use, However, the reviewer must state that, while the diction- 
aries are of similar physical size, the Dictionary of Education 
contains about twice as many definitions as the next largest. 

Somewhat atypical is the lack of use of capital letters in the 
names of certain standardized published tests, e.g., differential apti- 
tude test, thematic apperception test, and progressive matrices test 
but MAPS test, CAVD test, and Army Alpha test. 1 

The last objection is the rigid use of inverse word order with 
compound terms in order to stress the key word or noun form. Some 
examples where the normal order would be preferred are: test, t; 
deviation, standard; and coefficient, correlation. 2» 

Despite the minor objections previously stated, the Dictionary of 
Education is an excellent volume which the reviewer highly en- 
dorses. For readers with interests in education the dictionary will be 
of inestimable value, 

ROBERT A. JONES een 
University of Southern California 


The Analysis of Variance by Henry Scheffé. New York: John Wiley 
& Sons, Inc., 1959. Pp. xvi + 477. $14.00. е , 

Scheffé states in his preface: ^. . . I have tried to elucidate in a 

Unifled way what appears to me at present to be the basic theory of 


628 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ihe analysis of variance." This emphasis on theory may at first 
seem to preclude the thoughtful study of this book for researchers 
and applied statisticians in the fields of education and psychology. 
However, although some parts of the book are fairly mathematical, 
Scheffé is careful to give ample warning about these parts and the 
continuity of the book is not interrupted by omitting them. The 
book, in the opinion of this reviewer, is of great enough value to 
require the attention of any researcher who uses the analysis of 
variance in his research. 

The book itself is divided into two separate parts. The first of 
these is a rather complete exposition of the theory (and its relation- 
ships to application) of the fixed effects, independence, and equality 
of variance case. The first chapter in this section is concerned with 
the theory of estimable functions and orthogonal partitioning of the 
sums of squares implied by the least squares theory of Gauss. The 
theorems of this chapter, of course, follow from basic mathematical 
and statistical considerations and are derived without any restric- 
tive distributional assumptions, 

The second chapter, which adds the basic (normal) distributional 
assumptions, is concerned with the derivation of the standard 
analysis of variance hypothesis testing and with confidence region 
estimation Procedures. It might be said at this point that the 
appendices give the requisite techniques from matrix and vector 
algebra, as well as good sections on the multivariate normal distri- 
bution and the distributional properties of the statistics implied by 
the analysis of variance. The sections on vector and matrix algebra 
are exceptionally well done and would serve any reader well as & 
short review course in these areas. 

The book continues with the theory of the one-way classification, 
and a general exposition of the two-, three- and higher-way classifi- 
roe The author then gives himself some latitude in deviating 

tom the Standard analysis of variance and includes a chapter on 


: А andling and misinterpretation 
of experiments which fall into this category. With this said, it 


ich i t orrect contrasting procedures, 
which is well known, will not be belabored here, but the correct 


BOOK REVIEWS 629 


application of these contrasts is too often misunderstood. This 
section, while it may lack some computational clarity, is very clear 
in its distinetions of the utility and power of the Tukey and Scheffé 
techniques. It is certainly required reading for those who apply 
these procedures, 

Another important point, often overlooked or misunderstood in 
psychology and education, is that of power in the analysis of vari- 
ance. This concept, while covered superficially for simple cases in 
most applied textbooks in our field, has to this reviewer’s knowledge 
never been applied to the analysis of variance. The tendency in 
many experimental situations seems to be rather dichotomous. In 
many studies the number of observations is very small, and in others 
it is very large. Very few experimenters take the time to investigate 
systematically and to determine the number of observations neces- 
sary to insure adequate power for desired comparisons. In many 
situations the experimenter has an adequate estimate of the variance 
of his response variable (e.g., a standardized test) or can get one 
easily through a pilot study. The calculation of power, which is both 
conceptually and computationally simple, is given by Schefié (pp. 
62-65). 

The second main section of the book is concerned with models 
other than the fixed. It includes a chapter on the random effects 
model, including complete classifications of any degree, and nested 
designs. Another chapter is devoted to mixed models and includes 
high-way layouts also. A significant comment might be made at 
this point. Scheffé points out that procedures which are designed to 
test hypotheses concerning variances, rather than those concerning 
Means, are much more affected by departures from assumptions 
than the latter. In the words of Box, they are not as “robust.” In 
light of this, much more careful attention must be given to assump- 
tions when a random or mixed model is used than when a fixed 
model is employed. WE. 

In every chapter in the second section, careful attention is given 
to the charaeter of the expectations of the mean squares. The rules 
of their existence and estimation are given for the random and 
mixed models and for nested designs in both. Until recently, proper 
error terms for testing procedure have been in question in many 
situations, In all cases, the proper error is noted and in certain mixed 
models where these are not exactly obtainable, procedures are given 
for their &pproximate calculation. 1 

In the chapter preceding the last, the topie of randomization 
models is covered. A clear treatment of their logie and conceptuali- 
zation is given, with proper emphasis on their conditional character 
(in the probability sense). The fundamental nature of the permuta- 
lion tests is given along with their approximation by the usual 
normal-theory tests. The chapter also considers the implications of 


630 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


these models and randomization in general on the reliability of 
normal theory significance tests. 

The last chapter is concerned with the effects, on the analysis of 
variance, of various departures from assumptions. Some of these 
effects have been considered previously. The topics covered are: 
non-normality, as measured by coefficients of skewness and kurto- 
sis; heterogeneity of variance; and lack of independence. As men- 
tioned before, fixed effects models are generally robust, while ran- 
dom and mixed models are fairly sensitive. Concerning fixed effect 
models, moderate amounts of skewness and kurtosis are not gener- 
ally very damaging. Most heterogeneity of variance has little effect, 
as long as the cell numbers are nearly equal. Correlated observations i 
is a fairly serious condition under any model. These topics һауе 
been treated extensively by Box. The final section in the chapter is 
concerned with the theory of transformations and their utility. / 

The book has another outstanding feature; namely, the section of 
the tables which contains useful graphic representations of the non- 
central F-distribution. These are not tabulated in any other readily 
accessible source, and they are the necessary concomitants of any 
calculation of power. 

Tn retrospect, several general comments сап be made. The clarity _ 
of the presentation is, in general, good. This clarity is helped by the 
author's concern for readers who wish to apply their knowledge. In. 
addition, the level o 
obtain elear understanding is not impossibly high for a person who _ 
has had calculus courses. The utility of the book, then, remains 


Davy E. Wey 

Laboratory of Experimental Design 

University of Wisconsin 

An Introduction to 1, 
Goldfarb. Glencoe, 
Harrison Gough has stated that there are ti 
conscientious reviewer must attempt to answer in evaluating a book. 


The first question is “Is such a book n К ion 
Th eeded?" Th d question 
is "Does the book meet the need that does exist?" ET | 


masses of longitudinal data .. . show what he 


possibilities of the longitudinal method . . 


$9573 1. ^ 1 1 1 hion 
the peculiarities and the advantages and E tastes d 


the disadvantages of the 


i 
BOOK REVIEWS 631 


various types of longitudinal studies including the panel studies 
cohort studies, long-term public health studies, and other examples 
of growth and change.” It would certainly appear that there is a 
considerable need for such a book so that the answer to the first 
question mentioned above is “Yes”; a book such as this is needed. 
Unfortunately, in the opinion of the reviewer, the answer to the 
second question can only Бе “No,” This is regrettable, because this 
book has many potential strengths which do not quite come off in 
its present form. 

The primary reasons for this negative conclusion by the reviewer 
are that this book is not well organized and is not written with ány 
specified audience in mind. In trying to be all things to all audiences, 
it fails to be really useful to any. 

Unfortunately for the reviewer, such criticisms are difficult to 
document without becoming picayune. Since little is contributed by 
long listings of specific inadequacies, this reviewer has chosen to 
report one example for each of his two criticisms. He would like to 
assure the reader that these examples are not unrepresentative and 
are not isolated cases. 

The first criticism is that the book is not well organized, An 
outstanding example of this is its treatment of cohort studies, A 
reference to “cohort studies" is found in neither the table of contents 
nor the index. This reflects the fact that (in spite of the claims on 
the jacket) there is no section in the book dealing specifically with 
the definition, advantages, and disadvantages of this method. Cohort 
studies are mentioned only briefly in scattered places through the 
book and in such a way that no information is presented which 
would be useful to a person who is unfamiliar with such method- 
ology. 

The second criticism is that the book is not written with any 
Specific audience in mind. On page 103, the author presents a de- 
tailed description of a standard IBM card and on page 122 states 
without further explanation that “the basic sampling problem of 
longitudinal data arises from the fact that a sample selected at any 
One period of time is expected to be representative of a population 
which is still to be formed.” It would be very surprising if there 
Were a substantial number of people able to understand this state- 
ment about the sampling problem who are unfamiliar with IBM 
cards, 

A somewhat secondary criticism is that the title is misleading 
since the book is not about statistics (except for a brief chapter on 
the breakdown of data in tables). Indeed, the author’s only mention 
of “significance” is to state that “the significance of the differences 
18 determined by statistical considerations.” A more appropriate 
title might be An Introduction to the Longitudinal Method. 

It should be noted that these defects are as much, or perhaps 


632 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


more, the responsibility of the publisher as of the author. One of the 
responsibilities of a publisher is both to edit carefully and to aid 
the author in revising and organizing his manuscript. It would 
appear that very little aid of this sort was provided to Professor 
Goldfarb, an impression that is strengthened by a generally inade- 
quate index. 

As mentioned earlier, this book, in spite of serious inadequacies, 
does have many potential strengths, The reviewer personally hopes 
that a revised edition will be prepared which eliminates the defects 
and capitalizes on these strengths. One such potential strength is 
that Professor Goldfarb is one of very few authors who attempts to 
deal with the implications of computers and other automatic data 
processing for research. In a revised edition this section should be 
expanded, the descriptions of the function of such machines as the 
card sorter should be clarified, and more attention should be paid to 
Progress in computer technology since the appearance of UNIVAC. 

“А second potential strength is Professor Goldfarb's use of his own 
research in connection with the Health Insurance Plan of Greater 
New York as a vehicle for discussing problems involved in the longi- 
tudinal method, especially the day to day problems which are never 
reported in journal articles, This is а most interesting and, in the 
opinion of the reviewer, a valuable pedagogical device, However, 
this material also needs revision since in its present form it would 
be of little use to a neophyte in research, 

In Summary, the reviewer considers this book to be potentially 
valuable if revised but unacceptable in its present form. 

James M. RICHARDS, JR. 
University of Utah 


Introduction to Difference Equations by Samuel Goldberg. New 
York: John Wiley & Sons, Inc., 1958, Pp. x -+ 260. $6.75. 
Many Processes may be described in terms of a variable which 
takes on diserete values, By making the assumption Az — dz, these 
processes may be described by differential equations. It is, however, 
possible through the use of difference equations to describe them 
2 bs sr In the past few years a number of books have 
lo. 1 Galing with this topic. These have joined a rather slim 
St of volumes which have been for the most part not elementary. 
as introductory texts for mathematics 


Шыныны О 
BOOK REVIEWS 633 


ence equations.” The book can be highly recommended for this 
purpose provided the reader has had some mathematics training 
beyond college algebra. : 

Samuel Goldberg, who is associate professor of mathematies at 
Oberlin College, is the author of a monograph on difference equa- 
tions written at the invitation of the Social Science Research Coun- 
cil. His current book is an expanded and revised version of the 
monograph. Since the book is intended mainly for psychologists and 
economists, the many illustrations and problems are for the most 
part from these fields. It is appropriate then to examine the book 
from the point of view of the prospective psychologist reader. 

Technically, as Goldberg states, sufficient preparation for the 
reader would be facility with algebra, some knowledge of trigonom- 
etry, and a little mathematical maturity. It is likely, however, that 
a person with such abilities would have taken a course in analytic 
geometry and calculus. At any rate this might be considered good 
preparation, The author has been very careful in his use of mathe- 
matical terms. For example, he defines 
and N!; the binomial theorem and imaginary numbers are reviewed; 
functions and graphs are explicitly defined. It should be possible 
then for someone with little mathematical training to understand 
the material with some hard work. ч 

The substance of the book is motivated by & problem which 
clearly is most appropriately handled by difference methods, the 
fluctuation of response probability as a function of a discrete varia- 
ble, trial. The remainder of the book is divided into four chapters, 
the first of which defines the basic differencing operators (along 
With material on the operator concept). Chapter two provides the 
Teader with the means for setting up a difference equation. The 
Solution of a first order difference equation is derived in an intuitive 
fashion, and the concept of solution is explained. Extensive material 
9n sequences is included since the solution of a difference equation 
18 à sequence of numbers. A third chapter gives the general method 
of Solving the linear difference equation with constant coefficients, 
With emphasis on the first order equation for simplicity. This isa 
Very common type of equation and the general method of solution 
18 fairly straightforward. The book is filled out with sections on 
Senerating functions and matrix algebra relating these topics to 
Sequences and difference equations. 

This is a very interesting and well written book. The only real 
fault lies in the format of theorems and definitions. They are num- 
bered sequentially in each chapter and are not in bold type. Refer- 
ences to previously proven theorems thus require needless searching. 


634 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Hopefully, this defect will be remedied in a later edition. This book 
can be recommended for readers whose background would enable 
them to read the more mathematical texts, both because of the many 
psychological examples and because of the wealth of material of 
general mathematical importance. The less advanced reader will 
find that it opens an area of mathematics with important applica- 
tions in psychology. 

Етллот M. CRAMER 

The George Washington University 


An Introduction to Combinatorial Analysis by John Riordan. New 
York: John Wiley & Sons, Inc., 1958. Pp. x -+ 244. $8.50. 

Combinatorial analysis, which has its origins in the Ars Com- 
binatoria of Leibniz, has emerged in the twentieth century as a 
powerful technique for dealing with problems of selection, arrange- 
ment, and enumeration within a finite or discrete system—such as 
the aggregate of all possible states of a digital computer. The new 
methods in this fascinating branch of modern mathematics are 
being employed to solve problems in the biological, social, and _ 
physieal sciences, Future discoveries will undoubtedly far surpass 
those of the past. 

The spirit of combinatorial analysis has been vividly portrayed 
by Herman Weyl, a grand master of mathematics. In his Philosophy 
of Mathematics and Natural Science (Princeton, 1949, p. 237) Weyl | 
has written as follows (rearranged slightly for quotation here): 


"Perhaps the Philosophically most relevant feature of modern 
Science is the emergence of abstract symbolic structures as the 
hard core of objectivity behind—as Eddington puts it—the 
colorful tale of the subjective storyteller mind. The combina- 
torics of aggregates and complexes deals with some of the sim- 
plest such structures imaginable, It is gratifying that com- 
binatorial mathematies is so closely related to the philosophically 
Important problems of individuation and probability, and that it 
accounts for some of the most fundamental phenomena in 
inorganic and Organic nature, This structural viewpoint occurs 
р e foundations of quantum mechanics. In a widely different 
m dr von Newmann's and Oskar Morganstern's attempt to 
ound economies on a theory of games is characteristic of the | 


The need for a textbook on i i is i ifest 
: ; ч Combinatorial analysis is manifest. 
Dr. Riordan's book is the first on the subject in recent years, and its 


с СТЕАРИН 


BOOK REVIEWS 635 


appearance is therefore especially welcome. It will be extremely 
useful to the mature scientist who wishes to learn something about 
the methods for finding the number of ways in which some well 
defined operation can be performed. The author's aims are precisely 
indicated by the following excerpt from the dust jacket. 


"In addition to offering an up-to-date summary of an impor- 
tant and stimulating part of mathematies, Riordan's work 
provides statisticians with an intriguing study of the use of 
generating functions; it offers readers in the field of computers 
numerical tables and incidental material relative to the calculus 
of finite differences; applied mathematicians will find useful and 
interesting the general description and worked answers to 
combinatorial problems; engineers are offered important back- 
ground information for many of the problems now arising in 
systems engineering, for example, coding problems in data 
transmission.” 


Each chapter ends with an extensive set of exercises for the 
reader. Many of these problems are by no means routine in nature, 
and some indeed are miniature research problems. 

In writing this book the author has succeeded admirably in 
achieving his objectives. He has organized his material with great 
didactic skill. The homogeneity and cohesiveness of the chapters 
testify to his ability as a writer. To be sure, some topies of great 
Current interest, such as finite geometries, difference sets, and 
combinatorial designs are not covered. But these omissions, which 
are a reflection of the enormity of the subject, in no way detract 
from the importance of the book, 

. To those who would plunge headlong into the volume, the follow- 
ing admonition is perhaps in order. Because of the inherently subtle 
nature of the mathematics of combinatorics, the average reader is 
faced with rather severe demands on his algebraic and computa- 
tional skill. It cannot be denied that combinatorial mathematics is 
formidable mathematics. Even the expert may not find everything 

ere easy to read, and the tyro may find much that is impenetrable. 
The reviewer is certain, however, that developers of the art of meas- 
uring individual differences who delve deeply into the book will be 
riehly rewarded. 


ALBERT LEON WHITEMAN Ж, 
University of Southern California 


4 Rorschach Reader by Murray H. Sherman (Editor). New York: 
International] Universities Press, Inc., 1960. Pp. xvi + 440. $7.50. 

e Rorschach Technique, for better or worse—and the contro- 
VerSy still rages—is part of the tools of the trade for the clinical 
Psychologist, As such, the Rorschach, along with other projective 


636 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


techniques, is taught in almost every graduate school which grants 
the doctorate degree in psychology. While it is relatively simple to 
teach methods of administration, scoring, and principles of inter- 
pretation, it is difficult to communicate the concept that the 
Rorschach is less of a test, in the accepted sense of the word, and 
more of a research tool which can be used to obtain a measure of 
psychological understanding of individual behavior. As Sherman 
states in his preface, “If this kind of exploratory approach is taken, 
the Rorschach may be viewed as an extremely valuable way of 
studying how patients vary in their perceptions, how psychologists 
vary in their interpretation of perception, and how therapists vary 
in their ability to communicate and accept meanings and values 
differing from their own." 

A Rorschach Reader brings together between one set of covers 28 
articles which have appeared in various journals over the past ten 
or twelve years. Intended for use as collateral reading in a Rorsch- 
ach course, it provides the instructor and students with sufficient 
material to make such a course an interesting and valuable one. The 


i major problem in a book of readings which is made up of 
previously published articles is to provide some continuity so that 
it reads like a book and not like a journal. The editor succeeds in 
accomplishing this task through the use of explanatory comments 
which precede each article. In these introductory remarks he points 
out the salient feature of the article which led to its selection, and he 
relates it to other articles in the collection. 

The Reader Should Prove to be an excellent supplementary text 

and a worthwhile addition to the ever growing Rorschach literature. 
HAROLD Borko 

System. Development Corporation 


ааа‏ و ی و 


n раа» accomplished. The book should 
: ng and critical review of most of the impor- 
tant studies on Aggression, In this respect it is unique in the litera” 


REN 


BOOK REVIEWS 637 


manipulative type of study employing dependent and independent 
variables. In this section and throughout the literature on aggression 
is covered extensively—one of the outstanding features of the book 
—and each chapter is followed by a list of references, 

Differentiating anger, an emotional reaction, and hostility, a 
negative attitude, from aggression, an instrumental response, the 
author in the first chapter defines the three terms in a behavioral 
context. Thus the behavioral approach is emphasized throughout the 
book. Aggression is defined as, “.. . a response that delivers noxious 
stimuli to another organism (author’s italics) ....” A discussion of 
intent follows and then Buss holds that certain acts involving the 
delivery of noxious stimuli cannot be called aggressive. For example, 
a dentist repairing a tooth, a doctor giving an injection, or a parent 
punishing a child are not aggressive behaviors provided that a 
greater good in terms of a socially desirable effect is sought. Unfor- 
tunately this proviso subjects the definition of aggression to value 
judgments of what is good or socially desirable. Seemingly, Buss 
becomes involved with the teleological question of intent while 
purporting to exclude it from his definition of aggression, $ 

Direct measurement of aggression, as compared to indirect 
measurement by questionnaire, is introduced in the chapter dealing 
with laboratory experiments. The problems studied are concerned 
with varying the intensity of aggressiveness along some easily 
quantified dimension as well as with creating a situation in which 
aggression is permitted to occur, is unpunished, is instrumental in its 
value, and not injurious to any substantial degree. To meet these 
criteria, Buss describes an apparatus termed an aggression machine. 
Delivery of an electric shock, the intensity of which is graded by a 
series of buttons from 1 to 10, in a contrived learning experiment is 
the mode of aggression. The first subject is instructed in the role of 
a learning experimenter and the second subject is actually an 
accomplice of the true experimenter. This reviewer: feels that the 
technique is rather ingenious and permits a number of problems in 
the sphere of aggression to be subjected to laboratory investigation. 

The material on physiology stresses those studies dealing with 
human anger, The measures used are heart rate, changes in systolic 
and diastolic blood pressure, galvanic skin response, the cold pressor 
test, rotation in a centrifuge to blackout, urine analysis, and gastric 
changes, 

Part two treats aggression as a component or variable of person- 
ality, ie, as a lasting characteristic of behavior, and the indirect 
Measurement of aggression is stressed in the first two chapters of 
this section, The author does a very creditable job in pulling to- 
gether and evaluating the numerous studies of aggression with the 

orschach, the TAT, and a variety of inventories. Studies under- 
taken with the Rorschach are discussed under two groupings: scoring 


638 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of hostile content and scoring of the traditional formal elements 
(e.g, form level, movement, and location). In the hostile content. 
category, investigations of normals as well as deviant populations 
are considered. Hostile content on the Rorschach is directly related 
to aggressive behavior. Thus the drainage hypothesis of aggression. 
is rejected; rather, it seems the Rorschach samples a larger popula- | 
tion of hostile responses. The results of studies employing formal | 
Scoring categories to measure aggression are equivocal in that no 
consistent relationship obtains between such scoring and aggressive 
behavior. 

Results of experiments using the TAT to measure aggression are 
analyzed by the author under four categories: clinical, laboratory, 
miscellaneous, and modification of the TAT. Both normal and. 
deviant populations are studied in the clinical and laboratory cate- 
gories. For the deviant group in the clinical situation, the general 
design was to predict to an assault criterion from aggressive 
thematic content and Buss states, “TAT aggression is directly 
related to assaultiveness.” The problem of an adequate criterion 
measure impedes investigations with normals—most frequently 
ratings and, in a few cases, athletic activity are used—and generally 


Tn the laboratory situation, the TAT is used in an attempt to 
measure transient changes in aggression induced by the experi- 
menter—again with ambiguous results, Modification of TAT pic- 
tures specifically to elicit aggressive content shows that decrease in 
card ambiguity produces an increase in the number of subjects 
developing fighting themes, Accordingly the author concludes, “The 
best hope for future projective Measurement of aggression would 
seem to lie in modifications of the TAT.” 


m conclusions concerning the projeetive measurement of 


eed by inventories; projective techniques are more appli- 
cable € measurement of aggression in deviant subjects than in 


BOOK REVIEWS 639 


niques for this inventory are discussed and then related to the mini- 
mization of the variable of social desirability. The author concludes 
that the Buss-Durkee Inventory as compared to others demonstrates 
the most promise for the study of aggression. 

Part three is concerned with two themes: social aspects of aggres- 
sion as related to prej udice, and the study of aggression in childhood 
with emphasis on developmental trends. Techniques for measuring 
aggression in children include observation and the use of check lists, 
peer rating devices, doll play, destructive games, and projective 
methods. Because doll play has received so much methodological 
attention, Buss feels that this technique is currently most promising 
—at least for preschool children, 

One of the shortcomings of the book, in this reviewer's opinion, is 
the inadequate subject index, which is particularly unfortunate 
since one of the book’s chief uses will undoubtedly be as a reference 
work, This reviewer would have desired a broader coverage of pro- 
jective techniques in their relationship to aggression. Such coverage 
could have been readily achieved at the expense of the materials 
dealing with the Freudian and neo-Freudian theories of aggression 
80 easily obtainable elsewhere. 

Mention of a contradiction should be made. On page 60 Buss says, 

ger tends to energize aggressive behavior, and one aspect of this 
energizing function is a lowering of the threshold for the occurrence 
of aggressive responses.” But on page 89 he says, “When aggression 
occurs in the absence of anger, there is an increase (author's italics) 
in the tendency to aggress.” Finally, the author is to be commended 
for his consistent attempts to state and to interpret the conflicting 
findings in the studies on aggression. 


“ 


MONROE M. LEFKOWITZ 
Rip Van Winkle Foundation 
Hudson, New York 


Language and Concepts in Education by Othanel B. Smith and 
Robert H. Ennis (Editors). Chicago: Rand McNally & Co., 
1961. Pp. 221, 

In thirteen relatively short essays this volume presents penetrat- 

Ing insights into a number of educational problems often obscured 

У semantic difficulties. Words, words, words, and how they often 

obscure meaning seems to be the unifying theme of this book. The 

Pattern followed by most of the authors is to take an oft-heard 

Word or Phrase, such as “Mastery” or “Learning by Experience,” 

E to analyze it thoroughly. The authors usually begin by examin- 

жы the casual or conversational use of the word or phrase and then 

шы lo show the concept as used in the literature or by leading 

Dilosophers The contributors sum up their essays by revealing 

eir own preferred way of using the concept. 


0 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Robert H. Ennis, one of the editors, in his contribution (Chapte 
Seven), entitled “Is It Impossible for the Schools To Be Neutral 
shows how such philosophers as Dewey and Bode claim that scho ) 
cannot be neutral. Then he proceeds to analyze the word “neutrality 
to show that neutrality as well as its opposite, presuppose intent Û 
the part of the individual or institution under observation. If th 
is no intent not to be neutral, concludes Ennis, neutrality is p 
served. 

It is hardly possible in this necessarily brief review to comm 
on each of the essays, Some, indeed, may appear to the s 
educator to border on the trivial, Others, such as the chapter e 
titled “On the Reduction of ‘Knowing That’ to ‘Knowing How,’ 
display considerable elegance in their exposition of difficult to 

The weakness of most collections such as this is the lack of y. 
The reader will look in vain for one Strong guiding philosophy. On 
also has the feeling that some of the treatments are too brief, per 
haps €ven superficial. Chapter nine, "Equality of Educational Op- 
portunity,” could benefit by more extensive treatment, 

the whole, this is a collection of able authors, who have givet 
careful thought to their topics. The book should be welcomed by 


meant to be, a basic text; but it offers stimulating collateral readin 
for the advanced student of education, 
Hans GEORG STERN 
Los Angeles City Schoo 


Social Aspects a 
William C. Sayres (Editors). Englewood Cliffs, New Je 


Wisely, the book d 
raised. The authors have tried rather ri i je 
5 igorously, and this revi 
Penis RT to eei problems рага; Where 
8 с on are involved, most ilable 
pertinent information, such as Schi Dude ona 


EM 


BOOK REVIEWS “ї 


school size and the like, are included in the presentation of almost 
every case. 

A virtue of the book is the wide variety of cases reported. A con- 
venient table on the front end cover lists these problems that arise 
in one or more of the cases: city schools, civil rights, consolidation 
of school districts, curriculum change, discipline, drop-outs, ele- 
mentary education, faculty collaboration, family attitudes, legal 
actions, legislation, personnel policy, press, private groups’ effects 
on schools, private schools, publie opinion, pupil groupings and aeti- 
Vities, religion and the schools, rural and small-town schools, school 
board actions, secondary education, state-local relations, suburban 
schools, teachers’ organizations and teaching procedures. 

The cases, according to the introduction, represent reported truth, 
and difer in this respect from some other case collections in the lit- 
erature, which present fictionalized accounts. These cases present 
just such information as was available to the decision maker in the 
report. 

One regrets that private schools and elementary education, with 
their many problems, are not represented by more cases. One case 
deals with a private school, and only three out of the 22 with ele- 
mentary education. It becomes obvious that the book will render its 
chief service in classes devoted to the study of secondary public 
education. Perhaps it might have been better to eliminate the few 
problems dealing with elementary education from this volume alto- 
gether, reserving a further book for the presentation of cases dealing 
exclusively with elementary education. : 

The virtues of this volume, however, far outshine its minor de- 
ficiencies. The reviewer was not able to inspect the teacher's manual 
available with this book, which supposedly contains more complete 
information on some of the sociological classifications mentioned 
but briefly in this work. The volume should be most useful as a 
main text or as collateral reading in courses for public school ad- 
ministrators or teachers. 

Hans GEoRG STERN 
Los Angeles City Schools 


Сао стст MEASUREMENT 


Editor: G. Frederic Kuder, Duke University 
Associate Editor: John A. Hornaday, Greensboro College 
Assistant Editor: Joan F. Hornaday 
Business Manager: Geraldine R. Thomas 


BOARD OF COOPERATING EDITORS 


Louis D. Conen M. W. RICHARDSON 
University of Florida Richardson, Bellows, Henry and Co. 
HAROLD A. EDGERTON JOHN H. ROHRER 
Performance Research, Incorporated Georgetown University 
Max D. ENGELHART School of Medicine 
Chicago City Junior Colleges P. J. RULON T 
E. B. Greene Harvard, University 
Chrysler Corporation Dav SEGEL 
J. P. GUILFORD Indiana University 


University of Southern California O. 1- SHARTLB 


E. Е. Linvauisr Bier nee University 
ew e E The wa CN Institute for 

Ves M. Lono Community Research 

e Testing Беде THELMA С. THURSTONE 

AnDIE LUBIN University of North Carolina 

4 Dd c Ату Dumas HERBERT A. Toors 

езеатс, 
: ne 
SAMUEL Messick t Ohio State University 
E. G. WILLIAMSON 


Educational Testi э 
ng Service em 4 
WILLIAM B. MICHAEL University of Minnesota 


University of California, Ben D. Woop | 
Santa Barbara Columbia University 


Dororny ADKINS Моор * 
University of North Carolina 


UME TWENTY-TWO, NUMBER FOUR, WINTER, 1962 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


INTERNAL-CONSISTENCY RELIABILITY FORMULAS 
APPLIED TO RANDOMLY SAMPLED SINGLE-FACTOR 
TESTS: AN EMPIRICAL COMPARISON 


LEE J. CRONBACH ax» HIROSHI AZUMA: 
University of Illinois 


In the twenty-five years since Kuder and Richardson introduced 
their famous formulas, the interpretation of internal-consistency 
coefficients has become increasingly confused. Reliability has been 
defined in many ways, conflicting assumptions have been made in 
alternative derivations of certain formulas, and new formulas have 
not been adequately integrated with the remainder of the literature. 
Recommendations regarding the treatment of dichotomously-scored 
tests have been particularly diverse. This study compares several 
internal-consistency formulas and evaluates how well they serve 
various purposes. Our comparison will, at the outset, be completely 
empirical, showing how coefficients compare when all formulas are 
applied to the same test. The study is a by-product of an attempt 
(in collaboration with Gleser and Rajaratnam) to reformulate the 
foundations of reliability theory and to offer a better interpretation 
of coefficients, but in this presentation we shall subordinate theo- 
retical issues so that the reader can align the results with his own 
Conceptions of reliability. 


Plan of Investigation 


We adopt the method of Brogden (1946a, 1946b), generating hy- 
Pothetical tests according to certain specifications and examining 


1 А study conducted under grant M-1839 from the National Institute of 
Mental Health. A dditional assistance was provided by the Institute for Ad- 
vanced Study. The assistance of Kern Dickman in computer programming is 
gratefully acknowledged. 


645 


646 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


how internal-consistency coefficients vary with the specifications. 
Like Brogden, we assume that items have uniform tetrachorie in- 
tercorrelations hence represent a single "content" factor. We con- 
fine attention to items scored dichotomously (i.e., every response is 
assigned one of two possible scores). We create several randomly 
parallel tests by sampling at random from a defined universe of 
items (cf. Lord, 1955a, 1955b; Tryon, 1957). For each test, we cal- 
culate various internal-consistency coefficients, For purposes of 
comparison, two “external” coefficients are also calculated: the 
correlation between one test and another, and the correlation of 
test score with the person’s average score over the entire universe 
of items. 

Let r;; be the postulated interitem correlation. (Throughout this 
paper, we shall use т to designate tetrachoric correlations, and p or ф 
for product-moment correlations.) The ''diffieulty" P, is the pro- 
portion of persons passing or endorsing item 7. The distribution of P 
in the universe is specified a priori as uniform over the range .01 to 
.99. A computer samples values of P randomly with replacement to 
form a hypothetical test. Each test is specified in the computer 
memory by the n values of P for its items. 


Coefficients to be Compared 


І Notation. А set of n items constitutes a test, and ће sum of the 
item scores forms the test score. We shall designate tests in general 
as T, T, +++ A particular test under study will be designated 7* 
(with items û, û, ---) or T* (with items i, j, ---). If the items in 7* 
and T* are considered to be in one-to-one correspondence, the 
ki peur ee 1 and i’. Interitem covariances 


0:, covariance of any two distinct items, or, in other contexts 

of any two items within test 7* | 
i covariance between any pair of items, one from each test 
i enden LN corresponding items, one from each test 
m we Item ? in test, T* with а noncorresponding item 
Hence the set of C, is the uni 

on of the se 

shall also refer to the following еа EC 


ЕС; expected covariance over all item pairs in the universe 


а, and C5. We 


CRONBACH AND AZUMA 647 


Е,С., expected covariance of a fixed û with all other items in the 
universe. 

ltems may be classified into strata h, М, ete., the items within 
stratum A being designated 4, jı. There are n, items in any stratum, 
and Ул = n. The mean score for the person over all items in the 
universe will be called M. V(— с?) will indicate a variance, and C 
а covariance (e.g., Ci; = c,c;,). We assume throughout that we 
are dealing with parameters for the population of persons. 

Formulas. The numerous formulas to be applied are listed in 
Table 1. There is a basic similarity among the expanded forms of 
(1), (3), (4), (5), and (7). In the correlation between forms (1), the 
numerator is the sum of a square array of crosscovariances Са 
Which may be separated into a diagonal set of Од, and an off-diagonal 
set of C,;. In the numerator of an internal-consistency formula, the 
average of the within-test covariances Ci; replaces the average 
crosscovariance C, for the noncorresponding items. Other informa- 
tion from the single test must replace УО, for corresponding 
items. In a, we enter n times the average C;;; in ag the sum over 
Strata of n, times the average within-stratum covariance; in ay, 
the specially estimated covariance of each individual item with a 
closely. similar item, summed over items. It is evident that the 
Coefficients will have the following order from smallest; to greatest: 

1, o, as, and az. 

Little need be said about the first formula, the product-moment 
Correlation pr.r.. We must distinguish between a particular prere 
and the expected correlation. The equivalent forms of classical 
theory have uniform intercorrelations, but when tests are randomly 
Parallel the intertest correlations are not, in general, equal. Even 
With uniform interitem correlations r,, the variation in P causes the 
Фи to vary and thus affects the intertest correlation. Eprr denotes 
the expectancy over all pairs of tests that might be formed by 
Sampling from the universe, and Ёрт.т the expectancy for cor- 
Telations between the fixed Т* and all other tests. 

‚Та classical theory the uniform pr. equals the squared correla- 
tion of Observed score T' with “true score." In place of this “true 
Score” we speak of a “universe score’ M, the expected (mean) 
Score for the person over all items in the universe. The proportion 
 observed-ecore variance accounted for by the universe score is 
Pres". For randomly parallel tests the pra’ are not uniform; we 


648 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


"UODnUzIsoAUI }пәвәл@ әу Ul вўүпвәл SUBS әцұ BAIS p[nOA ъүпшшоў вввүә®лүп1 үеләчә8 әлош əy} ‘auo pus олә® poaoos stua}! 0} Асо Айе ewjnurio] o99q] ү 


С : Hog z] A - 1} =" 

«aj. Gehe (HO tA УОС mp/2 
*""A/CATe& + "OR X) ICAR UO) 

"ajos z^ ciega] vai ^ a € + тос) 


E BE e } “а L1) EZt- (aj 
24/ [toz I OEE D id я "OR ge) u 


Int I 
соя)" (гоя | narsa, [v2] 
z c z ч 
Lordoj PO kd + 2 z z) *Losdo]H7) ec 


110248 
se вивәш 89] Suourv UOVA 
Затта} uorjv[o1109 Ssv[ov1jut 
jo подәл әүїшєв-әЗгє] ‘ІСНУ (2) 
» ,,paoj9o1109,, 8,810} “о (9) 
8350) [9[[61vd-urojt 
20у uorje[oz100 ssu[ouzjut *7o (9) 
suvour 480} 
Suoure uoryvurea 3120091 
‘suno; [o[e1ed-pogrj eus 
10} uorjt[e1109 sse[oeqjur ‘SV (F) 
(0cuM 
peziérouos) suvour 389] 
Suoure uorjerreA 3110081 
uOn'*[ol100 SSU[O*1]UI fp (£g) 


Aymiquziez0uos 
Jo ЗпәгоШәоо Weld (g) 
SULIOJ 
u994j9q uonv[e1r00 **«L«Ld (т) 


=. ي‎ M c هه‎ сневссейнйакаырррркк снн лыр: 


изо} рәривйхт ui10j }әвйшо;г) 


рәуэлюа әд о] вюүлшыод 
I G'ISV.L 


uorjtogrjuopr 


CRONBACH AND AZUMA 649 


distinguish the particular prex from the expectancy Epru. The 
formulas given in Table 1 for prex? cannot be applied to actual 
test data since the universe parameters are unknown. But we, 
having specified the parameters, were able to determine E;C,; and 
ЕС. (If several alternate forms of an actual test were administered, 
one could estimate the specific pry’ by correlating T* with the sum 
of scores on the other tests 7", Т” ', +++, and correcting for attenua- 
tion to remove the effect of errors of measurement in these other 
tests (Cronbach & Terwilliger, 1960)). 

The intraclass reliability coefficient а has been derived or inter- 
preted in various ways (Hoyt, 1941; Jackson & Ferguson, 1941; 
Ebel, 1951; Cronbach, 1951; Burt, 1955). One of the derivations 
(Rajaratnam, et al., 1960a) shows that « is a ratio of two variance 
estimates: of Vy (estimated “true variance") to EV, (estimate of 
“expected observed variance” over the set of randomly parallel 
tests). For dichotomous items the special ease of о is Kuder-Richard- 
son Formula 20. 

Formula (4), as, was introduced by Jackson and Ferguson (1941) 
for estimating the reliability of the total score on a test battery. 
Since test items can often be sorted into strata on the basis of content 
or difficulty, Cronbach (1951) and Tryon (1957) suggested applying 
the formula to a single test having manifestly heterogeneous content. 
(See also Rajaratnam, et al., 1960b.) Within-stratum covariances 
Presumably being larger than covariances among items generally, 
ав is expected to be greater than a. 

In internal-consistency analysis, information unique to any item 
necessarily is considered as inconsistency or error. Many investi- 
Eators have defined reliability as the correlation of a test ‘with 
Itself,” or with another test measuring precisely the same content. 
It is required that the second test measure any content that appears 
in even one item of the first test. Thus, if the first test asks “Who 
discovered America?", the second test is expected also to include 
that content in one of its items. Such “matched parallel” or “item- 
Parallel” tests were treated formally by Lord (1955b) and Tryon 
(1957), In such tests every item i of one test is matched in both 
difficulty and content to the corresponding i' of a second, item- 
Parallel form. Since such tests аге as similar in content as tests 
With distinct items can be, and since alternate forms are ordinarily 
Prepared under less strict requirements of parallelism, one might 


650 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


suppose that the correlation between item-parallel tests is an upper 
bound to the correlation between alternate forms. A formula for 
estimating the correlation between item-parallel tests, which we 
shall designate az, was suggested by Lord (1955b) and by Cotton, 
Campbell, and Malone (1957, their Н ,). In this formula, ¢; is the 
correlation of 4 with the matched i’. 

Lord suggested how to estimate the covariance of item û with the 
hypothetical i^ using data from a single test. He proposed to factor- 
analyze the matrix of within-test tetrachoric" correlations r;; and to 
take the resulting communality for item 7 as an estimate of тд. 
The corresponding ф, can be found by the Kelley formula. But 
since tetrachoric correlations do not in general form a Gramian 
matrix, the factor analysis is not usually capable of being carried 
out. With our hypothetical items, however, the communality is 
equal to the specified uniform tetrachoric correlation. When ту; is 
assigned a uniform value, a, can be evaluated. 

All the foregoing coefficients are based in some way upon the 
product-moment correlations Фи. Loevinger (1947, 1954) objected 
to @ as an index of consistency among items because it does not 


reflect content homogeneity exclusively. Two items covering precisely 


the same content must have ¢ less than 1.00 if their difficulties are 


unequal, because the product-moment correlation cannot equal unity 
unless the two variables have identically-shaped marginal distribu- 
tions. Loevinger defined Фи (max) 88 the largest value ¢ could take 
with fixed P, and P;. Where P, > Pj bij cman) = Р{(1 — P,)/0;0;. 
Loevinger divided ф,; by $i (max) to get an index of interitem con- 
sistency which can reach 1.00. 


а, like ф, must remain less than unity when item difficulties are 


I be matien how homogeneous the content of the test. 
orst (1953), regarding this as a defect, proposed to rescale a in a 


ar to Loevinger’s treatment of ¢, so that it would range 
from 00 to 1.00. Let ашы rep ; - 


CRONBACH AND AZUMA 651 


have been unacceptable to many reviewers (Humphreys, 1956; 
Sitgreaves, 1961). Cronbach (1951) demonstrated that Loevinger's 
index is itself extremely sensitive to variation in item difficulty. 
This led us to suspect that ay is unsuitable as a reliability coefficient. 

The last of our formulas, KR21, is a lower bound to a, as is readily 
seen in its expanded form due to Tucker (1949). (7) is a special 
case of the Harris-Fisher intraclass correlation (see Ebel, 1951; 
Burt, 1955; Webster, 1960) and of the Horst’s (1949) “generalized 
reliability formula." 


Test Specifications 


Tn our procedure it is necessary to specify the statistical charac- 
teristics of the universe. The uniform r,; take on, successively, the 
values 1.00, .70, and .30. (All coefficients become zero when r = .00). 
The higher values of r,, are of theoretical interest because the 
behavior of ¢ is more anomalous as r increases, but we wish to 
emphasize results when r is -30 and below. 

Interitem tetrachoric correlations are ordinarily low in tests for 
Practical use, as some examples will show. Given coefficient a or an 
intertest correlation, one can apply the Spearman-Brown formula 
to estimate a typical interitem correlation (Cronbach, 1951). Coeffi- 
cients from the manuals for representative tests give these results: 

Metropolitan Achievement Test, Advanced: Word Knowledge, 
Spelling, Language Usage, Arithmetic Computation, Science. 
Range of ф, .09-.25; corresponding r, ca. .15-.40. 

Edwards Personal Preference Schedule. Range of ¢, .04—.18; 
corresponding r, ca. .06-.29, 

High School Personality Questionnaire (Cattell): Scores A, B, 
БЛ); Range of $, :04—.09; corresponding r, ca. .06-.15. 

In each instance, the tetrachoric correlations given are those corre- 
Sponding to the ф range when P, = P, = .50; but very similar 
Tesults would be obtained for any P,, P, between .90 and .10. 

Probably no test published for practical use has tetrachoric cor- 
Telations in the range above .30. Among tests used for research 
Purposes, higher intercorrelations are most often encountered in 
Guttman scales for attitude measurement. An interesting example 
another kind is the Stanford Hypnotic Susceptibility Scale where 

have a median r of „50 and a few item pairs have correlations 
*PProaching 1.00 (Hilgard & Weitsenhoffer, personal comm unica 


652 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tion). Sizeable intercorrelations may be more common in future 
mental tests. Several investigators (e.g., Peel, 1959; Wohlwill, 1960) 
have devised tests on the basis of Piaget’s theories. These tests 
have high internal consistency as measured by Guttman’s tech- 
nique; the level reached by ¢,; or т;; has not been reported. 

Our distribution of P is made uniform over the range .01-.99, 
with each P specified to the nearest .01. This is a distribution more 
likely to be found in personality tests than in ability tests. This 
wide range of P should increase the influence in our results of any 
peculiarities of $. Limited supplementary calculations have been 
made with P confined to the range .50-.99, which is more typical 
of ability tests. The results are similar to those for the full range, 
save that a and KR21 are in closer agreement. 

We employ two values of n, the number of items, 9 and 54. The 


very small sample of 9 items provides an extreme example of varia- 
tion of coefficients from test to test. 


Questions to be Investigated 


The intention of this study is to compare the several coefficients 
with each other for various n and fi; and, secondly, to determine 


how any one coefficient varies from test to test, within a family 
of randomly-parallel tests. 


Our plan and assumptions are 
used tests with a few fixed 
three coefficients: 

« (KR 21); 
KR2, which under the con 
identical to ar; and 


рте, Where t is the normalized value of our M. 
ANE Brogden's several distributions of P, the one he calls “пог- 
mal is most nearly comparable to ours. His results anticipate our 
conclusions regarding ® ar, and p’rey. We are able to extend 
Brogden’s work by bringing ad 
by examining variation associ 
pasos ud in the light of reliability theory that has 


re much like Brogden's, save that he 
distributions of P. Brogden compared 


ditions of his study and ours is 


Tesponses are scored one or Zero, 


our conclusions apply to any dichotomous scores. For item i, 


CRONBACH AND AZUMA 653 


V; = P(1 — P). Кт = 1.00 and P; > P,, С = Р,(1 — Pj). 
Ifr = .70 or .30, we estimate C; using the formula given by Kelley 
(1947, p. 383). The test variance Vp = Sve SEDO 

We draw two sets of n items simultaneously. By calculating all 
variances and covariances for the 2n items and cumulating them 
appropriately, it is possible to evaluate all seven formulas of Table 1. 
The calculating routine takes roughly the following form, though 
the steps are not performed in this order, Each step save the second 
is carried out for each set of items separately: we shall refer to either 
set, considered singly, as 7*, 

1. Combining all item variances and covariances within test Т* 
gives the variance Ут. and hence the standard deviation сл». 
KR21 can now be calculated. 

2. Summing crosscovariances С, evaluates the numerator of for- 
mula (1) which is then divided by the test standard deviations to 
Bet preps. 

CHAM 27 0, 48 obtained, summing item covariances within test 
T*. This permits us to calculate a. 

4. Items within test 7* are classified into nine strata on the basis 
of difficulty, The strata cover difficulty ranges .01-.11, .12-.22, -+- , 
89-.99. Covariances are summed within strata to evaluate the 
Second term of the expanded form of ag. 

5. The value of max) i8 brought forward from step 3 (using 
7и = 1.00). The other two a's are divided by this to get the values 
Of ay at r = .30 and r = .70. 

6. A table obtained by a separate operation gives values of E,C;; 
as a function of Pi, and EC,,, for each туу. These are used to evaluate 
Prey. 


Results 
Mean Values of the Several Coefficients 


Table 2 Presents the chief results for 9-item tests. Since 40 tests 
Were Studied, two at a time, we have 20 values of prr», for non- 
Overlapping pairs of tests. There are 40 single tests for which prac 
and the internal-consistency coefficients can be estimated. Only 20 
Values for KR21 were obtained, because of technical difficulties. 

Altes of а, were not computed for the 9-item tests; it is not reason- 
able to estimate Cissa when n, is very small. 

Accompanying each mean is its standard error. This valus must 


654 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 


Reliability Estimates for 9-Item Randomly Parallel Tests: 
Means, Standard Errors of Means, and Ranges 


Tij 
2.05.5... 01 005 чш E PIRE 
Coefficient No.! 3 1 1.00 
pru? 40 .6132 + .0045 .8392 + .0044 .9195 + .0071 
(.55 — .67) (.74 — .87) (.75 —..98) 
prr 20 .609 + .0039 .8365 + ‚0041 .9085 + .0099 
(.58 — .65) (907 == .87) (.79 — .98) 
а 40 .599 + .005 .819 ± .005 .895 + .005 
(.51 — .67) (.74 — .88) (.82 — .96) 
а 40 .614 + .004 .852 + .003 1 
(.54 — .67) (.80 — .88) 
an 40 icd + .008 .915 + .001 1 
.63 — .70) (.90 — .92) 
KR21 20 .959 + .040 .648 + .048 -803 + .021 
(—.05 — .63) (.89 — 86) (.52 — .95) 


Which may be regarded as 
coefficients, over all possible 
erse. Figure 1 compares the 
pru. Plotting the difference 
with rising r, so that dis- 


those in Figure 1, 
в about the expected values of 
a sample): domly parallel tests of which these are 


1. E E рут, and E ру? т x А 
tests these should be йыл e Closely. With strictly equivalent 


the two coefficients f 
been kanani or randomly parallel tests has not previously 


^ 


—— T9 


CRONBACH AND AZUMA 655 


2. E a slightly underestimates E Е prr and E prx? (a conclusion 
previously established by a mathematical argument; Rajaratnam, 
et al., 1960a). The average difference for 9-item tests is in the neigh- 
borhood of .01 or .02. 

3. As expected, o; reaches unity when r = 1.00. When r < 40, 
E a, exceeds from E pz," and E E рут, by about .01 or less. 

4. Except when r is 0 or 1.00, ag gives substantially higher coeffi- 
cients than any other formula; E æy exceeds the expected intertest 
correlation by as much as .05. 

5. On the average, KR21 coefficients are extremely low, compared 
to all others. The value is close to what а or ртм? would be for a 
test of three items from this universe, one-third the actual length. 

Table 3 gives comparable information for 54-item tests. With a 
test of this length, every coefficient is high even at r — .30. The 
generalizations above again apply. There is very close agreement, 
Оп the average, among prx, prr, and a, even when r = 1.00. The 
failure of the internal-consistency measure to reach 1.00, then, 
reflects a property of randomly parallel tests which appears in other 
reliability coefficients also. The average a; exceeds these three coeffi- 


*.080 
ан 


> 
ak *040 aL 


mean 
æ 
“ч. 


Difference of coefficient from 


000: о 
a шс 
a 
7.040 
.00 30 70 100 
Interitem fetrochoric correlation 
.00 14 36 52 
Interitem correlation (F) 
.00 61 84 92 
2 
Mean Prom 


Figure 1. Means of Estimates of Coeficiente for 9-Ttem Testa, Expressed as 
Deviations from the Mean of pra" 


656 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 3 


Reliability Estimates for 64-Item Randomly Parallel Tests: 
Means, Standard Errors of Means, and Ranges 


rij (tetrachoric) 


Coefficient — No.! 3 дү 1.0 
ртм? 12 ‚9034 + .0011 .9689 + .0034 .986 + .0084 
(.90 — .91) (.96 — .97) (.97 — .995) 
prr 6 :9036 + .0007 -9692 + .0027 .9890 + .0038 
(.90 — .91) (.97 — .97) (.98 — .993) 
а 12 .901  .001 .966 + .002 .982 + .0005 
(.89 — .91) (.96 — .97) (.98 — .98) 
ar 12 .9045 + .001 -972 + .001 1 
(.90 — .91) (.97 — .97) 
ан 12 СЫ + .001 .983 + .0001 1 
.91 — .92) (.98 — .98) 
KR21 12 .844 + .004 .9365 + .002 .964 + .003 
(.81 — .86) (.93 — .95) (,95 — .97) 


? Number of coefficients contributing to average, 


cients by an entirely negligible amount when 1 < 70. KR21 again 
gives values lower than the other coefficients, Although the dis- 
Crepancies are not great numerically, a discrepancy between .90 and 
84, or 97 and 94, is about equal to the change in reliability expected 
from halving test length; it is by no means negligible. Again, a, is 
larger than other coefficients, departing from the between-forms 
correlation and pra? by much more than о does. 

Brogden’s mean v 
Though his pr? and 


ae M ы their numerical values. When p — 70 and 
+ = 9%, lor example, our mean pry? is 99 d F 
(interpolated from his n and his pr,’ is about .94 


4-item tests by random i 
sampling, and for 
each test we computed a, ws, and от. The means and г were 
as follows: i. 


fü 3 4 1.0 
@ 900 (80-90) 905 (99.97) | 

E -96-. -981 (.98—. 
«s .903 (.90-.91) 971 (.97-.97) ies, 


996 (.995-, 
а .903 (90-91) 971 (97-97) 1 i 


CRONBACH AND AZUMA 657 


These means for o and o; differ unimportantly from the means in 
Table 3. as computed with nine difficulty strata proves to be ex- 
tremely close to o;. This is important because Lord's coefficient is 
generally impractical to compute. Only a moderate amount of labor 
beyond that required for œ is needed to obtain total score Т, for 
each individual on each stratum, so as to compute as. With coarser 
stratification, аз will move away from az, toward o as a limit. 


Variation from Test to Test 


A coefficient for a single test or a single pair of tests is often 
interpreted as an estimate of the reliability of any test of the same 
sort, i.e., of the reliability of the family of randomly parallel measures. 
For example, consistency among ratings by one set of judges is 
interpreted as indicating the reliability of ratings by any similar 
judges rather than of ratings by those fixed judges. The means in 
Table 2 and 3 are estimates of the expected values of the coefficients. 
How well specific coefficients correspond to the expected value is 
indicated by the ranges. In interpreting the ranges, the varying 
number of tests must be taken into account. For 9-item tests the 
Tanges are considerable, with a’s among the 40 tests ranging .08 
above and below the expected value. There is nearly as large a 
Tange for pry’ and az. The range for prr is smaller, as would be 
expected from the fact that each value is based on 2n items. Among 
ö4-item tests the range of specific coefficients other than KR21 is 
entirely negligible. 

Lord (1955a) derived sampling-error formulas for а and KR21. 
He concluded that variance of these coefficients over tests is negli- 
gible compared to their variance over samples of persons. Most of 
Our data are consistent with this conclusion, but it does not hold 
for a when 7,, is large and, аз in this study, there is a wide range of P. 

enn = 54 and т = .70, the variance over tests of а is about 
00003. If we take the error variance of a product-moment correla- 
tion as a guide, the variation over samples of 100 persons, of a 
coefficient whose population value is .97, will be about .00004. 

hus the two sampling variances are similar in magnitude. 

Detailed data for five pairs of tests are presented in Table 4, for 
the reader who wishes to make a study of sampling fluctuations. 
In order to describe the specific tests, the range of P was arbitrarily 
divided into nine equal intervals, the first interval including P from 


658 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 
Statistics for Five Pairs of Tests* 
(n = 9, Ti .70) 


Test Pattern Vr ртм? PTT а OH ат, KR21 


1 111 130 110 6.564 8588 8487 8375 9181 8635 7441 
2 130 102 101 4.923 8318 8042 9186 8442 6271 
3 021 130 020 6.911 8624 8675 8461 9234 8663 7616 
4 001 142 001 8.369 8645 8649 9053 8804 8266 
5 233 000 100 4.546 7416 8076 8129 9001 8449 7265 
6 203 121 011 7.052 8609 8450 9146 8685 7668 
7 111003 012 4.868 8349 8435 9036 9195 8424 6091 
8 100 220 121 5.699 8397 8205 9117 8557 6919 
9 010 013 202 6.053 8325 8448 8310 9094 8598 7533 
10 021210 022 5.267 8468 8102 9163 8480 6560 


* Decimals omitted before coefficients, 


01 to .11; we report the number of items in each interval. The 
pattern 111 130 110 describes a sample of items spaced widely over 
the entire range, with some predominance of more difficult items. 
The largest value of pry? arises when items pile up in the middle of 
the range (test 4), and the smallest when the sample consists of 
extreme items (test 5). It will be seen that the internal-consistency 
coefficients do not vary with pry? or prm. For example, a is greater 


than рец? for tests 4 and 5, just because the variation of P is small 
within those tests. 


There is no need to 
coefficients when n, 


KR 21) vary negligibly. But when n = 


nr = 30. For the specific test æ is .01 
timator of the latter. When 


Di 5 
Tous definitions of reliability, 
ur results for each of the prom 


Since there are nume: D 
the implications of o we shall examine 


inent interpreta- 


: CRONBACH AND AZUMA 659 


tions. Though, in this discussion, we shall make some statements 
about theoretical expectations that are not well-known, we shall not 
attempt to justify these statements. This paper is not the place to 
set forth mathematical theory. 

Classical equivalence theory. The most familiar theory defines 
reliability as the correlation among equivalent tests, i.e., tests that 
have equal variances and uniform intercorrelations, and that in 
some sense measure the same thing. 

Item parallel tests, but not randomly parallel tests, are strictly 
equivalent in the sense of this theory. If we were to construct a 
pair of tests having identical common-factor content and an identical 
P distribution, the correlation between tests would be equal to az. 
We may consider a, as a criterion, from the standpoint of classical 
theory, against which to judge other estimates. 

We find that the correlation of 7 with a randomly parallel 7" is 
close to a, when тү; < .70, and underestimates œz when г; approaches 
1.00. The easily estimated a is a fair to excellent estimate of a, 
when r, < .70, though Table 4 shows some discrepancies as large 
as .04 for 9-item tests. as agrees with a, more closely than a, as 
expected. With very fine stratification, оз gives an excellent estimate 
of az. The estimates given by a; and KR 21 are not acceptably 
close to œz. 

Horst proposed ay, as a “realistic estimate of the reliability” 
defined as “consistency of behavior within a very limited time 
interval.” Neither Horst nor Loevinger rationalized the proposal to 
divide the obtained coefficient a or ¢ by its maximum value, nor did 
Horst, give any empirical justification for considering он to be 
"realistic." Since his aim was to remove effects associated with the 
dispersion of item difficulties, and since a; accomplishes this, we see 
no merit in his proposed coefficient which greatly exceeds az. 

We conclude that a person who wishes to estimate the correlation 
of the given test with a strictly parallel test should compute a unless 
Tu is believed to be unusually high. In that event, аз should be used. 

he present study is confined to tests homogeneous in content; 
with heterogeneous items, the difference between a on the one hand, 
and e, and a, on the other, is expected to increase. 

Generalizability over randomly parallel tests. A given test may be 
+ Tegarded аз a random sample of items from а pool (Cronbach, 1951; 
Lord, 1955b). If the aim of measurement is to estimate the standing 


660 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of the individual on the average over all items in the universe, we 
may define the desired coefficient as pra? or E ртм“, and use one or 
the other as a criterion for judging other estimates. 

A mathematical theory pertinent to these coefficients can be 
elaborated (Rajaratnam, et al., 19602). It can be shown that а 
estimates Y y/EV with a negligible bias, and that this ratio is a 
lower bound to E ртм. Our findings are consistent with this theory; 
the mean e underestimates E pru? by less than 025, for n > 9. 
Because of sampling error, however, o for any one short test may be 
appreciably above or below Е pru”. For the 54-item test, any a is 
a good estimate of E pz. 

The specific coefficients pru? vary negligibly about the expected 
value when n = 54. There is considerable variation when n = 9. 
Ит; = .30, a for any test is an excellent estimator of the specific 
pra for that test. No internal-consistency formula is a very good 
estimation of the specific coefficient with n small and r, large. 

The score M, is the expected value of Taw scores for person p over 
all randomly parallel tests. One might reasonably be concerned 
instead with the expected value of Scores standardized within tests, 
in which event the “criterion” for the reliability coefficient would 
L^ dien анаа, et al. States complex condi- 
eder dui i lower bound to E pry’. From that theory 

coefficients for the present tests to be in 


close agreement, and that is con: rmed by imilarity of the 
in Tal 2 and 3 
| firmed the similarity of t 


We conclude that if one wish 
parallel tests, o i 


to this model. 


Our results do, however, contradict a highly plausible expectation 


CRONBACH AND AZUMA 661 


regarding аџ. “Item-parallel” tests conform to a highly controlled 
sampling plan. One would expect two such tests to agree more 
closely than tests formed by sampling within coarser strata, hence 
it is plausible that o; "gives a least upper bound for the test reli- 
ability that can actually be attained (in theory) by sufficiently 
careful matching of the items in parallel test forms” (Lord, 1955b). 
In our data, however, a, for a test T* is often less than Prete 
For many pairs of tests prer, falls between ар for T*, as ean be 
seen in Table 4. This is an unexpected result. How can a test agree 
better with another test containing the same content and unlike it 
in item difficulties, than with a test matehed in content and difficulty? 

A study 9E the problem shows that, as we expect, фу, is higher than 
Фи when ї, i’, and j measure the same content and P; = P, z P;. 
But 0;0; ġa, the covariance of matched items, may be less than 
Ci; for unmatched items, however unreasonable this seems at first 
glance. Figure 2 makes this evident, plotting values of C; as a func- 
tion of P;, with т; fixed at .30. In general, for an item with an 
extreme P, and hence small variance, the covariance with a "parallel" 
item will be below its covariance with an item whose P is in the 
middle range. 

We can trace the effect of this on test scores by considering a 
two-item test 7%, with item difficulties .06 and .50; and a test T* 
With difficulties .50 and .50. The item variances are then .056, .250, 
250, and .250, respectiv ely. The covariance for the item pair in 7* 


.05 
04 


03 


Figure 2 Relation of Interitem Covariance to ps, py when ru = 30 


662 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


is .014, that within T* is .048, and the four crossvariances are .014, 
.014, .048 and .048. The value of $,V; for the first item in T* is 
only .006. That for the second item is .048, so that o; is (.006 + 
.048 + .014 + .014)/.334 or .25. o; for test T* is (.048 + .048 + 
048 + .048)/.596 or .32, and ру. т, is .28. 

Generally speaking, if o; for test T* is less than a, for randomly 
parallel test T*, pr.r. will exceed the correlation of T* with a 
strictly equivalent test. This is similar to the familiar principle that 
validity can exceed test reliability when test and criterion measure 
much the same thing, and the criterion is highly reliable. We expect 
аг to be particularly high for tests with large У, $;, that is, tests in 
which Р? cluster around .40-.60. This appears to be the case in 
Table 4; compare tests 3, 4 and 9 with 2, 5 and 7. We therefore 
would modify Lord’s statement to say that a, is the value approached 
by the intertest correlation from above or below, as the alternate 
form is matched more and more carefully to the given test. 

All our conclusions are restricted to tests composed of items with 
Tu uniform. They may require modification when the Tj vary, 
even if items have a single common factor. If the universe is hetero- 
geneous in content, the conclusions will certainly change. (For 
example, o; is then expected to be appreciably larger than a). 


A computer program to extend our work to more complex universes 
is under development. 


Summary 


3 Es randomly parallel tests were constructed by sampling 
random from a pool of dichotomous items with uniform tetrachoric 


dosi d and E rectangular distribution of difficulties. Various 
rmulas were evaluated for each i 
an infinitely large urs ch test. It is assumed that 


А ple of persons is studied (i.e., sampling errors 
Кен ee ignored), and that all persons take the same 
in mind that in ps E our conclusions, the reader should bear 
з иың p e jer and educational tests the average item 
¢ averages .05-.25 ordinarily quite low; т, averages .05-.40, and 

viens Conclusions should not be generalized to other 
types of test, nor to tests with fewer than nine items. 


We find very close agreement 


a) of E ртм, the expected 


Squared i А 
Score, with EE P correlation of test with true 


the expected between-tests correlation. 


CRONBACH AND AZUMA 663 


b) of Ea with both of these ( = KR20) 
¢) of the average value of Lord's formula for item-parallel tests, 
called az, with E pru and EE prr M rj; < .70. 


d) among specific coefficients prar for 54-item tests 

e) of a with prar for that specific test, if тү, = ca. .30. 

f) of as, an internal-consistency coefficient making use of the 
average covariance within strata, with a; for any Tej. 


If reliability is defined as the correlation between strictly equiva- 
lent forms, о is a satisfactory estimate unless ту; is quite high, in 
which event ов is preferable. If reliability is defined as the expected 
correlation for randomly parallel tests or the expectancy of the 
squared correlation of randomly-selected test with true score, œ is a 
satisfactory estimate. Its variability from test to test is appreciable, 
however, when the tests are short. 

A “corrected” reliability coefficient proposed by Horst is shown 
to be unsatisfactory. We also find that estimates from Kuder- 
Richardson Formula 21 are too low to be acceptable as a substitute 
for a, pry’, or prr. For the tests studied here, with a wide range of 
P, the KR21 coefficients are often in the neighborhood of the half- 
test intercorrelations. Lord's interpretation of az as an upper bound 
to the correlation between tests is found to be incorrect and is 
modified. 

The most important single finding is that when a test is a random 
Sample from a pool of dichotomous items whose content represents 
a single factor, and whose mean intercorrelations are within the 
normal range, the а coefficient is highly satisfactory as an estimate 
of the intertest correlation and the percentage of variance attributable 
to true score. 


REFERENCES 


Brogden, Н. E. “Effect of Bias Due to Difficulty Factors in Prod- 
uct-Moment Item Intercorrelations on the Accuracy of Estima- 
tion of Reliability by the Kuder-Richardson Formula No. 20. 
Ботош AND Ps¥CHOLOGICAL MEASUREMENT, VI (1946), 

7920. (a) ү 

Brogden, H. E, “Variation in Test Validity with Variation in the 
Distribution of Item Difficulties, Number of Items, and Degree 
aoe Intercorrelation.” Psychometrika, XI (1946), 197-214. 


Burt, C “Tes +a sas : А гу jance." 
» C. "Test Reliability Estimated by Analysis of Vari 
British Journal of Statistical Psychology, VIL (1955), 103-118. 


664 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Cotton, J. W., Campbell, D. T. and Malone, R. D. “The Relation- 
ship between Factorial Composition of Test Items and Meas- 
ures of Test Reliability.” Psychometrika, XXII (1957), 347-358. 

Cronbach, L. J. “Coefficient Alpha and the Internal Structure of 
Tests.” Psychometrika, XVI (1951), 297-334. 

Cronbach, L. J. and Terwilliger, J. “The Specific Reliability of a 
Test or Rater.” Paper presented to Midwestern Psychological 
Association, St. Louis: May, 1960. 

Ebel, R. L. “Estimation of the Reliability of Ratings.” Psycho- 
metrika, XVI (1951), 407-424. 

Horst, P. “A Generalized Expression for the Reliability of Meas- 
ures.” Psychometrika, XIV (1949), 21-31. 

Horst, P. “Correcting the Kuder-Richardson Reliability for Disper- 
sion of Item Difficulties.” Psychological Bulletin, L (1953), 
371-374. 

Hoyt, C. “Test Reliability Estimated by Analysis of Variance.” 
Psychometrika, VI (1941), 153-160. 

Humphreys, L. “The Normal Curve and the Attenuation Paradox 
in Test Theory.” Psychological Bulletin, LIII (1956), 472-476. 

Jackson, R. W. B. and Ferguson, G. A. “Studies on the Reliability 
of Tests." Bulletin No. 12, Department of Educational Research, 
University of Toronto, 1941, 

Kelley, T. L. Fundamentals of Statistics. Cambridge, Massachusetts: 
Harvard University Press, 1947, 

Kuder, G. F. and Richardson, M. W. 


‹ “т "stima- 
tion of Test Reliability." Cor EDEN 


; » Psychometrika, ТЇ (1937), 151—160. 
po MR i BUM тозон to the Construction and 
0 ic raphs 
LXI (1947) No. ¢ ity." Psychological Monographs, 


Loevinger, Jane. "The Attenuation 
chological Bulletin, LI (1954), 493-504. 


Lord, F. М. "Sampling Fluetuations Resulti i 
of Test Items,” Psychometrika, eee 


ability.” EDUCATIONAL AND Psy- 
XV (1955), 324-336. (b) 


Peel, E. A. “Experimental Examination of Some of Piaget's 


Emus Derived fr 
1 І d Educational Research, Univ. 
ajaratnam, Nageswari, Cronbach, L, J i 
jaratnam, tonbach, L, J., and Gl H C. 
DAMES as Generalizability : Interaal-Consistenty red 
iN ү m a Stratified-Sampling Assumption.” Bureau of 
Pes uca ip University of Illinois, Urbana: 1960. (b) 
UN oe ith. “A Statistical Formulation of the Attenuation 
acox in Test Theory.” In Solomon, H, (Ed.) Studies in Item 


CRONBACH AND AZUMA 665 
Analysis and Prediction. Stanford: Stanford University Press, 


Tucker, L. R. “A Note on the Estimation of Test Reliability by the 
Kuder-Richardson Formula.” Psychometrika, XIV (1949), 
117-119. 

Tryon, R. C. “Reliability and Behavior Domain Validity: Re- 
formulation and Historical Critique.” Psychological Bulletin, 
LIV (1957), 229-249. 

Webster, H. “A Generalization of Kuder-Richardson Reliability 
Formula 21." EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 
XX (1960), 130—138. 

Wohlwill, J. “A Study of the Development of the Number Concept 
by Scalogram Analysis." Journal of Genetic Psychology, XCVII 
(1960), 345-377. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


THE BASIS OF RECOGNITION AND INTERPRETATION 
OF FACTORS 


RAYMOND B. CATTELL 


Laboratory of Personality Assessment and Group Behavior 
University of Illinois 


What Uniquely Characterizes a Particular Factor? 


The recognition of factors as psychological concepts, existing in 
abstraction from many particular mathematical matrices, popula- 
tion and variable samples, and experiments, is of rapidly increasing 
importance in psychological theory and practice. One can confi- 
dently anticipate an increase of contributions examining the techni- 
cal basis for appropriate new psychological concept formation from 
such matrix factor comparisons. A recent example of crucial interest 
to personality theorists is Peterson’s (1960) article in this journal 
questioning the continuity of personality factor concepts over the 
child-adult developmental period, in the field of ratings. In the 
Present writer's judgment, Peterson's negative conclusions are quite 
Unjustified, but his paper does a service to the field by showing how 
Vital these issues are to advance in personality theory and by pro- 
Voking study of the technical bases for factor recognition and 
interpretation. 

At the outset it should be made clear that the present paper deals 
with only one of the two main situations in which factor identifica- 
tion is required. This is the situation when the matching is between 
two distinct experiments which do not deal with the same subjects, 
but where the same tests (perhaps mixed with others) have been 
Applied to different subjects. When the problem, alternatively, is 
that of dealing with different tests (hypothesized to measure the 
Same factor) given to the same subjects, the drawing of a conclusion 


668 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


is comparatively straightforward, since the two configurations can 
be brought into essentially the same space, except for slight non- 
overlap due to error, specifies, ete. Even in this latter, simplified 
“common subject” situation some difficulties will arise when the 
two experiments are separated by a long time interval. R-technique 
picks out state factors as well as trait factors, and, due to function 
fluctuation, what are psychologically the same state factors will 
then no longer align in the matrices. In effect, time has made subjects 
no longer “common subjects,” and the factor alignment must then 
be pursued by the second method, which is our topic here. 

In the case of identification across different person samples, the 
present writer has elsewhere (Cattell, 1957) surveyed the full range 
x evidence needing to be considered, indicating six criteria as fol- 
ows: 

0 The nature of the factor pattern of loadings, (or some related 
dimension-variable relation profile) or some function thereof. What 
is really a special form of this is involved in the matching of factors 
by their angular position in a derived common configuration of best 
mutually fitting test veetors, as the four slightly different schemes 
of Ahmavaara (1954), Cattell (1955b), Tucker (1951), and Kaiser 
(1960) propose. 

4 (2) The agreement of the set of correlations of the given factor, 
in its own matrix, with other (different, already identified) factors in 
that matrix, with the set known to be typical for that factor, from 
У, this principle has had an interesting use 
lidity, in the concept of Circumstantial 
Coan, 1958). This introduces, by con- 
ct Validity (correlation with the cri- 
net concept that the test shall correlate 
ors than the criterion in the same way as 
- In the case of the High School Personality 
>een experimentally shown that the values, 
mdp ST different factors, for direct, and circumstantial 
RA КЫ» one another very well (Cattell, et al., 1958), as 
ens oí hieu expected, thus lending empirical support to 

(n Ты rion (2) here for factor identification. 

е absolute size of the mean variance contribution of the 


given factor, In relation to a standard, stratified sample of variables. 
(In principle components the “latent root") 


RAYMOND B. CATTELL 669 


(4) Regard for “proof by elimination." This requires that every 
well designed factor experiment should, regularly and systematically, 
include among the variables introduced to study its own special re- 
search topic, or domain, a context of marker variables, chosen to 
represent comprehensively the well-known established factors. The 
present principle states that when F; in study X is tentatively 
matched with F, from the series appearing in study Y, additional 
confidence is contributed to the match by the absence of any serious 
rival match for F; in Y, or for Fx in X. Many published studies have 
married F; in X with a spurious and second best factor match, Fi, in 
Y, simply because the variables in the Y study did not suffice to 
make the necessary F; factor, as a possible match, appear there at 
all. 

(5) The behavior of the factor as a psychological entity, treated 
as an independent or dependent scientific variable, when involved 
in subsequent or accompanying manipulative experiment, This evi- 
dence can be made immediately available in the factor analysis 
itself if one uses the condition-response experimental design or its 
Variants (Cattell, 1952). 

(6) Regard for the character of the modifications in (1), (2), 
(3) and (4) above that would be predicted from (a) sampling error 
(b) real differences of population (selection) (c) changing reliabili- 
ties of variables (d) changing conditions of administration of tests, 
and (e) differences of experimental design, as between R-, P-, and 
Incremental R-techniques. The main aim of the present paper is to 
discuss the nature of the modifications from these sources, and the 
consequent preference, in, for example, loading pattern comparisons, 
Tor one form of index of similarity, e.g. the pattern similarity co- 
efficient, r,, the salient variable similarity index, s, over the correla- 
tion coefficient г. 

It seems to be insufficiently realized (and is, for example, the 
main cause of the inconclusiveness of Peterson’s (1960) arguments 
Ш the case cited), that all six of the above should enter into any 
final judgment on the psychological identity of a factor. 


The Weakness of Identifying by a Simple Comparison of 
Dimension Variable Profiles 


Я It will be shown below that even if one were to risk basing identi- 
cation on restricted evidence from profile similarity alone—(1) 


670 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


above—there would still be six different profiles deserving con- 
sideration for any one factor. But if all were considered, the con- 
clusion would still be vitiated by the inadequacy of (1) above. The 
prevalence of dependence on (1) suggests that we should pause to 
illustrate its Achilles heel, before proceeding to the wider construc- 
tive suggestions. 

The chief reason why identity of loading profile cannot be taken 
as proof of identity of two underlying factor influences is that it is 
theoretically possible for two distinct factors in the same matrix to 
have identical profiles, It is not only possible, but it has been pointed 
out by the present writer as occurring more than once in actual data, 
creating the concept of “cooperative factors” (Cattell, 1952), ie. 
two distinct influences which affect a set of dependent variables 
similarly. The only restriction to this occurrence is that neither shall 
load any one variable more than 0.7, i e. 4/05, 
more than half the variance, A simplified, extrem 
in Diagram 1, where X and Y (which are draw 


not accounting for 
e example is shown 
n negatively corre- 


Diagram 1 
Phenomenon of Cooperative Factors 


RY, 


Fi Variables with high and roughly 
V Proportional loadings on 
\ both factors 


Nice pee B 


Мун! 
id ку; 
A 
dele 
. - E \ 
————— en) \ 
QUE “з === МЗ. >, 
à 
b 
M 
N 
s 
\ 
7 


RAYMOND B. CATTELL 671 


lated to remind the reader that there is no necessity for cooperative 
factors to be positively correlated) load A, B, C and D the same way. 

It is easy to see theoretically why such factors should arise, and 
several instances are known in psychological and other data. For 
example, in a sufficiently varied array of psychological and physio- 
logical variables the sympathetic and parasympathetic system fac- 
tors (the latter reversed) will pick out almost the same set of 
salients, while in meteorological data, with human psychological 
and physiological variables, one might expect that “cold” and 
"damp" faetors would be highly cooperative. Among well known 
psychological instances one can cite, from abilities, the several 
memory factors (Kelley, 1954) and the two spatial ability factors. 
In a diversified matrix the two spatials are similar because they 
"stain"—to use a microscopist’s metaphor—mainly spatial variables, 
having zero loadings on others. In personality, the cooperation of 
schizothymia, A(—), and threctia, H (—), factors has long been 
noted (3), and a consistent similarity of loading pattern of question- 
naire factors for ergic tension, Qa, and guilt proneness, O, centering 
on about --0.4, has been noted in both American and Italian popu- 
lations (Cattell & Meschieri, 1960). 

Incidentally, the superimposed configuration methods, mentioned 
above, in Ahmavaara’s “unified factor method,” (1954), Cattell's 
confactor rotation (1955b, Cattell & White, 1962), and Tucker's 
factor synthesis (1951) do not deliver us from this difficulty any bet- 
ter than simple loading correlation tests. The only escape from pos- 
sible mistake of identity lies in (a) finding variables that load above 
07, (b) so accurately estimating the number of factors that both are 
guaranteed to appear in each matrix, and (c) designing so great a 
Variety of variables that the chance of finding those on which co- 
Operatives will finally differ is raised. 

Although our introduction in these two sections has thus stressed 
the importance, in factor identification, of the total context, under 
six headings, the space of the present article is such that the rest of 
d must be given to concentration only on one—the pattern of the 
dimension upon the variables, as stated in (1) above. 


Definition of the Siz Main Varible-Dimension Relations 


: Probably four-fifths of all published factor analyses talk about 
Actor loading patterns when strictly their tables present only 


672 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


reference vector correlations. But, in any case, there are at least six 
alternative possibilities of expressing the profile of variables upon a 
dimension, and it is time that they became clearly recognized. For 
they could have different degrees of usefulness for identifying, i.e., 
matching, discovered factors, and they certainly have different 
meanings for purposes of interpreting the influence revealed by a 
factor. The latter has hitherto received less attention. 
Traditionally, a factor has been primarily and unquestionably 
expressed and recognized, eg, by both Spearman and Thurstone, 
by approach (1) above and in terms of a profile of loadings. How- 
ever, Spearman had not yet met the serpent in Eden which talks of 
oblique factors, compelling a choice for good or evil between the 
loading and the correlation profiles. With orthogonal factors the 
latter are identical, and, as for other profiles, there is then only the 
profile of weights for estimating factors, Consequently, researchers 


In our own cross-validating and match- 
we have sought maximum ob- 


With this brief 


and ai i 
ча е tated that our discussion of 


in (3) above, Initially, we proposed that 

t of 3, by applying r, (Cattell, 

correlation (Haggard, 1958) instead, but er du mation, the intra-class 
r. 


RAYMOND B. CATTELL 673 


"rotations to simple structure" as the principle of unique resolution 
in each experiment before matching, the matching tests actually 
apply to factors rotated in any way. Indeed, one of the important 
uses of a properly developed identification (matching) procedure, 
over and above clarifying urgent psychological conceptual decisions, 

is to test the goodness of various alleged principles for reaching 
` unique, accurately-replicating resolutions, e.g., simple structure or 
confactor rotation. In particular, better matching tests might test 
the present writer’s claim that no rigidly orthogonal solutions can 
yield such good validity (matching) across experiments as oblique 
solutions. 

The six (or more, with special derivations) patterns available in 
the general oblique case for expressing a dimension will be called, in 
the absence of any yet recognized generic term, dimension-variable 
(henceforth, D-V) profiles. As variously deseribed in factor analy- 
tie textbooks (Burt, 1940; Cattell, 1952; Fruchter, 1954; Harman, 
1960; Thurstone, 1947), they are: 

(1) Reference Vector Structure: Matrix Vy. (r for reference vec- 
tor) By agreed convention (Cattell, 1952; Fruchter, 1954; Harman, 
1960; Thurstone, 1947) “structure” always refers to correlations, 
while “pattern” (in this technical context) has referred to loadings. 
The reference vectors (R.V.'s in Diagram 2) are the vectors drawn 
Perpendicular to the hyperplanes obtained in the usual “factor 
Plots” by visual rotation, The correlations of the variables with these 
unit length vectors are geometrically represented by the perpendi- 
cular projections of the test (variable) vector on the reference vec- 
tors. (Test oX has projections oa and ob respectively on R.V., and 
ВУ» Diagram 2). Оа and similar values for other tests, would 
Constitute the column for a particular RV in the RV structure ma- 
riz, V... which is the commonly published table in factor analytic 
Tesearches, 

(2) The Reference Vector Pattern: Matriz У,у. The column for 
the RV as it appears, instead, in the V, matrix consists of the load- 
Mgs (not correlations) of the same variables on the given RV. 
Loadings have the same meaning as beta coefficients or standard 
Weights for estimation (of a variable) in a multiple correlation. 
. 16У are the weights to be assigned to the scores on various RV’s 
1 getting the best estimate of the variable in question. As in the 
Multiple r, these loading weights for estimating this “criterion” be- 


674 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


come modified from the actual correlations with the criterion # 
(variable), according to the degree of correlation among the contrib- ' 
uting factors themselves. The loading of the RV's with respect to 
the variable oX are shown by oc and od in Diagram 2. 

(3) The Factor (sometimes called? “Primary” Factor) Pattern: 
Matriz Vj. In the oblique case the reference,veetor RV, and the 
corresponding factor, F, are not colinear, but represent different 
(though usually substantially correlated) entities. The RV; as | 
stated, is the normal to the hyperplane. The corresponding F is the 
line of intersection of all hyperplanes other than that belonging to 
the reference vector (and factor). In the simplest ease of a two fac- 
tor problem, representable in two spaces, the RV and F appear as 


Diagram 2 


Relations of Factor, Reference Vector and 
Unrotated Centroid Projections 


Ху 


Г 


1 ys would aid clarity 

18 better used to distinguish fi 
i rende. W iary factors. 

F and RV sufficiently distinguished what we are d dy e 


if writers would dro 3 a Р ” 
D this use of primary, for “primary 
rst-order from second 7 4 


RAYMOND B. CATTELL 675 


shown in Diagram 2 (from p. 318 of the writer's book (1952) ). The 
reader unfamiliar with mathematical conventions in this area should 
note that the projection on an oblique axis is obtained by a line, e.g., 
Xg in Diagram 2, parallel to the second factor axis, not by a perpen- 
dicular to the first axis.? 

The meaning of the loadings in еу; matrix, is, of course, that 
_ they are the weights to be given to factor scores in making the best 
estimate of the variable. Hach row in such a matrix gives the co- 
efficients (beta weights) in the familiar “factor specification equa- 
tion” for any given variable. There is a very simple relation between 
the loadings of a series of variables on the factor, and the correlations 
of the same series of variables with the corresponding RV. Indeed, it 
is a simple proportionality, obtained by multiplying all the way 
` down the reference vector correlation column by a ratio, d, peculiar 
to each factor, yielding the corresponding factor loading pattern. 
This is readily seen by comparing oe and ob (correlations) in Dia- 
gram 2 with of and og (loadings), where d is the reciprocal of the 
cosine bog between factor and RV (No. 2). 

(4) The Factor Structure: Matrix Vja. This represents the cor- 
relations of the variables with the factor, as instanced by om and on 


„21а Diagram 2 it is assumed that the ВУ 1 and 2 have been rotated to 
simple structure by the usual plots, beginning with the drawing from projec- 
tions on the principal axes or centroids (C: and C9), obtained directly by 
` factorization. (The orthogonal centroids are shown only very curtailed in 
lagram 2, for clarity of the drawing) Only two variables, X and Y, are 
pen from among those not in the hyperplane, and the lines to the points 
(variables) which constitute the hyperplane are not drawn, in order to keep 
The drawing uncluttered, A correlation, as usual in the spatial representation 
18 the cosine of the angle between one variable vector and any other vector, 
Multiplied by the root, communalities (lengths) of the two vectors concerned. 
е the R V's and F's are of unit length, this cosine is simply the p gie 
rest уйды variable on the RV or F vector, as shown at oa and om. For the 
1 ^l lagram 2 is self-explanatory. | 
«а og practitioners in factor analysis often loosely refer to йуз 
Кы» the Rv and the F are strictly different. For example, the اا‎ 
F ng the RV ’s are quite different from, though related to, those among е 
ficati Sychologists first encountering this duality tend recurrently to seek simpli- 
ation by abandoning the F's and working only in the RV system since the 
MM e What one first gets! But this is really no solution. For example, if 
Sore ds up the test scores of items high on a factor, eg, in obtaining betters 
in Ha for the various factor seales, he will find that the correlations o 
esti Xperimental work with these scales are actually, essentially (except for 
: amate error), those obtained among the F's in these plots, not those каш 
ive w 8. Complicated though it may seem, the psychologist has to — 
factors And use the extra information from—both reference vectors 


676 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(Diagram 2), representing a perpendicular projection of the variable 
upon the factor. The loadings (pattern) on the reference vector have 
exactly the same simple proportionality to the correlations with the 
factor, in the present Vs, as the loadings on the factor have to the 
correlations on the reference vector, described in (3) above. But 
there is no such simple proportionality between. the series of load- 
ings and correlations for a single variable, i.e., for any correspond- 
ing rows in the F and the RV systems. 

(5) The Factor Estimation Matrix, Уу. When one comes to esti- 
mate the factor scores from the normalized test scores, he needs 
the beta weights of tests on factors. These we will call “estimation 
weights” to distinguish them from the reciprocal beta weights of 
factors on tests which we call “loadings.” This factor estimate ma- 
trix is written Vy, using the e (estimate) subscript to refer to the 
factor estimate and to distinguish from the factor pattern, Vip 
oriented to the variables, 

(6) The Reference Vector Estimation Matrix, Vre. Analogously, 
one can write a matrix for estimating an individual’s RV scores 
from his test variable scores, Except among a very few psychologists 
who have decided to handle everything in reference vectors there is 


little practical use for this, and it is added here only for symmetry 
and completeness of concepts. 


Diagram 3 shows why, logically, 
variable connections are exh 
rive from two kinds of dimen: 
in three possible ways, па 
latter in each of two dire 
Measure from another are 
one of the two possible di: 


these siz kinds of dimension- 
austive. For the possible matrices de- 
sions, each connected with the variables 
mely, correlation, and estimation, the 
ctions. The estimations of one kind of 
Tepresented by dotted arrows, pointed in 
rections of estimate, while a correlation, 
Diagram 3 
The Six Basic Matrices Relating Dimension and Variable 


» Reference Vectors 


RAYMOND B. CATTELL 677 


being different in nature, has a continuous line, double headed be- 
- cause the relation is symmetrical. 


Derivations, Transformations and Meanings of the Sia DV Matrices. 


Since extremely few psychologists seem to be familiar with the 
meanings of these six alternative DV statements, obscurely located 
even in textbooks, and not brought into systematic relations, it 
should be helpful to give a brief resumé of their derivations before 
proceeding. In so doing it may be desirable to use a notation some- 
what different from Thurstone’s and the mathematician’s. This sym- 
bolism uses subscripts directly reminding the user of the verbal 
definition, and has proven itself over twenty years to be much easier 
and unambiguous for communication among psychologists. Therein 

_ Vis used for any dimension-variable matrix, while the orthogonal 
unrotated centroid is V,, and the rotated factor (f), structure (s) 
matrix is Vs, and so on. Except for this special development within 
the dimension-variable matrices, the symbols are traditional, as 
follows: 


N = the number of people in the sample. 

R = the original correlation matrix of variables. 

М» = the centroid (or principal axes matrix obtained by using 
communalities) of k factors. 

Уз, Vs, Vin and Vg, = transformed dimension-variable matrices 
as designated above. 

М», (and potentially Arp, Ag and Ар) = the corresponding trans- 
formation matrices, which convert from the Vo, as in (1) below. 
(Normally, of course, the last three are not used, and are added 
here only for symmetry. These are derived from Ars, €.8- Ap = 
Are D) 

Cr and C, = the cosine (or correlation) matrices among the refer- 
ence vectors and factors, respectively. 

D = the diagonal matrix obtained from inverting C, and proceed- 
ing as shown in equation (3) below. (The entries therein are 
correlations between RV's and F's.) 


As every student knows, the first step after extraction is to dis- 
Eon and proceed to the unique rotational position, by one of various 
Principles of resolution, finding the lambda matrix in: 


(1) Ж - Vous 


678 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


When factor experiments stop and publish results thus, i.e., in the 
reference vector system, it is commonly overlooked that the high 
and low “loadings” by which one proceeds to interpret the factor, 
and possibly also the simple structure position reached, would 
have been different if some dimension-variable relation system other 
than Уу, had been used. For instance, ће V,s values are different 
from the factor loadings, Vip, and from the weights to be given to 
tests in estimating the factors, Vre. True there are certain propor- 
tionalities—for example, the RV correlations, Vrs, are simply pro- 
portional to the F loadings, Уер; Ьо the relative sizes of factors 
(column), (in the sense of (3) discussed as a criterion of matching 
above) are quite different. Even with all values in corresponding 
columns proportional, the patterns of loadings seen from the stand- 
point of the variables, i.e., the rows, are quite different. 

Although the Vrs is meaningful for some purposes, one commonly 
needs to proceed to the factor pattern matrix, i.e., the loadings on 
the variables. This requires a slightly roundabout step, in which one 


d M the correlation matrix among the reference vectors, 
n Dy: 


(2) О, = №, Ars and then proceeds to:* 
(8) Cr = ро," р 


With this Су, one is in а position (having the incidentally ob- 
tained D) to handle all possible transformation among what we 
Br called the family of six dimension-variable relationships. In the 
is у equation each DV is first shown as immediately derived 
rom its predecessor; though the direct route from Vo, the starting 
point, is shown by a second equation in each case. j 


(4) Vis = YD = VAn D 
(5) Va = ves = Vora D 
(6) Va = V,,D^ = ү.) 


* First one invi i А 
columns of this bris. БООСУ matrix to get Cr. The rows and 
the vectors to which they pi plied by а series of d values to bring 
by taking the reciprocal of the espond back to unit length. These d's are found 
a diagonal matrix D. Nes ater. the diagonal values, and building 
it “normalizes” it, as shown in Y, when this pre and post multiplies the Ce 
pointed out above, is that they earner way of seeing these d ratios, 
the reference vectors and the rg corelations-betee^ 


RAYMOND В. САТТЕ, 679 


The factor and reference vector estimate matrices are derived: 


(7) Ve = Vi Ra = (VN Nu 
(8) VR = УШЕН 

To get some idea of the magnitude and nature of the changes in 
these transformations, before discussing psychological meaning, let 
us watch them through a real and reasonably typical psychological 
example. The source traits of intelligence, anxiety and extraversion 
have had much theoretical debate lavished on their natures. Let us 
therefore consider them as they appear through objective tests, in 
one of the most accurate and long rotated factor analytic studies 
available? on a sample of 500 Air Force cadets (conducted under 
the sponsorship of Sells (Cattell, 1955a). These factors are chosen 
because they are in fact pretty typical of personality factors, in 
having only moderately oblique simple structure relations (extra- 
version and intelligence are virtually orthogonal). Consequently 
the V, and Vi, which have been arranged side by side in Table 1 
80 that their proportionality ean readily be seen, are almost identical. 

This table suffices to bring out the essential fact that the four 
matrices (and, one may add, the two not shown) are all different, 
though the relatively slight obliquities (.002 to .139 make the 
differences quite small. With larger correlations among factors one 
often deals with loadings well over unity (a possibility of which many 
experimenters seem unaware) in the Vip, though none happen here. 
However, the table shows that a variable may correlate with a fac- 
tor significantly (taking r = .10 as our limit for significance) yet 
hot be loaded by it significantly, and vice versa. For example, poor 
Classification test performance (M. I. 51) is loaded by the anxiety 
factor (—.143) but is not significantly correlated with it (—.069), 
and the same is true of poor numerical performance (MI. 199). 
The somewhat uncommon, but possible situation in which a variable 
15 positively correlated with a factor but negatively loaded on it (or 
Vice Versa) is illustrated by annoyability (M.I. 211) in regard to 
Introversion (U.I, 32). Let us point out without waiting for the 
general discussion that the simplest way to conceive the general in- 
Stance where variable X is zero or negatively loaded by F; but 


Positively correlated with it, is that X really has zero or negative 

——— —— 

ex The writer is indebted to Research Assistant Peter Schoenemann for the 
4nsformation calculations here and in Table 3. 


680 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


601+ = Ye T'O PUP се ‘ГП pus ‘827+ = тс T'A pP% TIA '£007— = 


TE TA рче т T) :e1v 610309) o8o1 Япоша (19) suoneperz09 эч, ‘ON 


GGT + GFE + FHS + YO + 260 + F90 + 080'+ Є0`+ 010'— 610'— 090'— 80: — бет 9o13y оў Aouepuay, ес 
942 + OF SEY G+ /10°— 6z0 + 080° — &0`— 190°+ 90L'4- 090°+ cot 6 БӘ} ssojuoo 
оў ввәштрвәү 9I 
ISE HF I£€'-- GEG + z€'-- 100°+ 790°+ 010`— I0'— 220°+ 880+ 020`4- cO + TIZ &ypqeAouuy 66 
TO` + 080°+ 1/0°-+- 20°+ Sc0'4d- TFOF ogo'+ go'+ z00'-- oto'+ 000°— 00:— 122 T9)0.L/sureau(q 
*Aouen[q + 
TLZ + 86£'-- EEF EEF OG + ILP + Fer + GFF THO + ¥90°+ 020°+ z0^4- H eres (— Н) 
HINL T'ANI 9 
860°— /01°— $/1`— /1`— 29¢°+ OCF + FFF FF 600° — 600°+ ogo’ + go'+ a eros (— 4) 
Aouatınsoeq *q'qor + 
140`+ 980°+ 180°+ €0'-- 990`— &/0`— 180:— 80`— Tez + 0/6`-+_ £9&'d- OS + GFE Хюшәрү 
ТвлләЈи 31008 8 
SFO — 690°— ©РТ`— $L'— FEOF geo + I907-- ©0`+ 9/84 Zog + Sc9^d- c9 -d- 1 е) Ig 
0£0'— /%90°— GIT — IU'7— FIO — 610°— 000'— 00'— tg + TGF + 806° 0e^d- 66I Q0UBULIOF 
мәд [UorioumnNr oz 
NAM SAE E А A A O a el: не DE GE E 7 (veogr 
Ajorxuy UOISIOAEI)X;[-UOISIOAOT)U] əuzu 119398) 121989) 
Fe TA ze тп TED doqumN 9[Q9LIU A “луш 
хәри 10499] 
INSEW 3318] ш 
"ou “PU 


=—.  — he ج ج ج ج ج‎ ое 


fipairuy pup *uois12004ju [-UO1S190D HTT ‘900010747 uo angon 
10799 Д 99u942fo) PUD 'spjDio M оошу ‘aanjonyg LOIDA ULIDAI 10107 fo arunjquasagy fo әәлбә(т 


I WUISVIL 


RAYMOND B. CATTELL 681 


direct connection with F; but is positively influenced by another 
factor, Fx, which happens to be appreciably positively correlated 
with Е. Factor F; really has no dealings with X, but since it is 
correlated with Еу, it is necessarily correlated with what Fx has at its 
command. Just so, Company A may not employ Mr. X, but if its 
dealings strongly affect Company B, which does employ him, its 
behavior is correlated with what happens to Mr. X. There is an 
assumption here that a factor operates as a cause, which we shall 
more explicitly examine and support in à moment. 

What Table 1 also clearly brings out is that although the simple 
Structure position would be the same in the reference vector strue- 
ture (Vrs) as in the factor pattern (Vrp) rotations (though the hyper- 
plane width might be slightly different) it is not the same in the 
other DV systems. Three out of the nine variables (M.I. 51, 199 and 
219) would demand that the simple structure be differently placed 
in the factor structure (Vs) and the factor pattern (Vip). Also the 
table demonstrates that the weight to be assigned to a variable in 
estimating a factor (contributing to a battery for the factor) can 
differ appreciably from its factor loading. For example, Annoyability 
(М1. 211) loads the anxiety factor (U.I. 24) about twice as high 
as does Tendency to Agree (M.I. 152), but in measuring anxiety 
(Vie), it is three times as important. Similarly, extraversion affects 
(loads) surgency (F factor) more than it does parmia (H factor on 
the 16 P.F.), but parmia contributes more validly to the measure- 
Tent of the second-order extraversion factor. Yet not only in esti- 
mation but also in “interpretation” many psychologists have arbi- 
trarily confined themselves to one of these DV matrices without any 
theoretical discussion or justification of their choice. 

If we set out to do any classification among these DV systems it 
Would be natural to regard the dimension-estimate systems, Vre and 

7» 48 more different from the remaining dimension-variable sys- 
tems than they are from each other, which is why Vre (representative 
of V. and Vr.) has been set a little apart in Table 1 from the repre- 
Eu examples from the other systems. One important cleavage 
Bab € У and Va systems from the Vr is that they tend to be more 

le from experiment to experiment than the Vre, since the con- 
lent of the company of factors around a particular factor will 
Benerally be more constant than the list of variables which the 
experimenter happens to use, (The variables are anybody's choice; 


682 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


but in a given domain of variables the factors are given by nature). 
The reason for the Vte instability is that the addition of a new vari- 
able, Y, closely correlated with an old one, X, (the Уу value for 
which was determined in a previous experiment) while it will not af- 
fect the Ve and Vis values, will seriously affect the Vre value, due to 
the usual effect on a multiple r beta weight of introducing a variable 
having a high correlation with an existing predictor. 


First Principles in Factor Identification and Interpretation: 
Factors As Influences 


Our aim is now to ask what the meaning of these DV matrices, (or 
of more apt ones that might be developed) can be for factor interpre- 
tation. Our purpose is first to glance at possible novel concepts, 
though before doing so we will summarize the above discussion of 
properties of the commonly accepted six DV matrices, in Table 2. 

An obvious realm of new possibilities is simply the hybrids among 
these six (°C2 = 15, if we pair them, or °C, + Og + °С» = 50, if 
we take all possible interactions), From inspecting these we have 
concluded that most would be statistically more complex than mean- 
ingful, or too esoteric for general research purposes, though some 
might Please certain investigators by “ 
be given to various systems. For example, the hybrid of the Vy and 
Ve. would be а mean expression of the weight of a factor in estimat- 
Ing à variable and of a variable in estimating à factor and might 
seem to some psychologists a pivotal expression of the factor-vari- 
able relationship and a suitable system in which to seek the most 
YE. simple structure, Or, again, the geometric mean of the 

t» and Vis values (that of the Vip and У,, value would be the same) 


5 TABLE 2 
ummary of Six Basic DV Matrices and their Symbolization 


averaging” the importance to 


Referrin, n ITE 
Form of g Referring variables to 
А variables to REFERENCE VECTORS 
== FACTOR (F's) (RV's) 
orrelation Factor Structure Reference Vector Structure 
Estimation of Lig Ys - 
Relation |dimension ч Factor oo Reference Vector Estimate 
as " Vre 
Weight imati: 
н те аа Factor Pattern Reference Vector Pattern 


variable ү 


u—————À——Ó—29 Ce 
г Ка 


RAYMOND B. CATTELL 683 


would be an expression that would put into the hyperplane both the 
items with zero loading on the Уу, and those with zero correlation 
on the Vis (since whatever is zero in one would be zero in both). 
Parenthetically, this last matrix, which we encounter in another con- 
nection below, might gain dangerous popularity with the unsophisti- 
cated, because in the search for a maximum simple structure, i.e., 
one "clearly" zero in many entries, its penchant for producing many 
zeroes certainly attraets the eye! 

Our choice of any DV system, old or new, is going to depend from 
this point on, upon its aptness and dependability for factor identifi- 
cation (matching) and interpretation (hypothesis testing). As far 
as identification is concerned it might seem since all ОУ” are 
mutually transformable, that one is always as good as another. 
However, this would overlook that matching is undertaken always 
after a unique resolution has been achieved, and this unique position 
(whether by simple structure or by confactor rotation (Cattell 
& White, 1962)) will be different if the principle is expressed in 
different DV systems. For example, the simple structure position 
terms of factor correlations, Via, is different from that reached by 
getting simple structure in factor loadings, Vr, and conceivably, one 
Will be found to yield a constancy of meaning of any factor from 
experiment to experiment which the other will not—indeed, we argue 
$0 below. To stress that “all are mathematically equivalent” is 
therefore to miss an important point. 

Similarly, but more so, in interpretation, our insights rest on con- 
trasting the large and small figures (loadings, correlations, weights) 
and these differ with different DV’s. The position of Kaiser and some 
Mathematical statisticians that our question, “Which DV system is 
Most appropriate for identification and interpretation?” is “mean- 
Ingles" overlooks the differences of mathematician’s and scientist's 
restrictions, For a scientist it could be that instead of five matrices 
being “redundant,” all are necessary—particularly the relationships 
Among them—to complete interpretation, though one may be best 
for the most common scientific meaning of a factor, i.e., an influence. 

This mention of "influence" brings us to the crux of the problem, 
that most factor analysts have been unable to state explicitly what 
their assumptions are about the scientific, conceptual status of that 
Which appears in the model as a mathematical factor. So long as 
factor analysis is treated as statistics only, presenting simply а 


631 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


statement of “shared variances" (factor-variable correlations, re- 


gressions, weights), one is at the mercy of the classical dilemma in _ 


covariance that A may “cause” B, B may "influence" A, or some 
third thing may be the independent variable to both. 

The present writer's theoretical position, in contrast to that of 
many factor analysts, who, like Burt (1940) use factor analysis as 
& purely classicatory device, or, say, like Wherry (1957) and 
Wrigley (1956), as a relatively arbitrary orthogonal grouping for 
clusters of variables, has always been that & factor (but not a 
cluster) is a cause. If one adopts the position that a factor is a cause 
or influence, one must introduce into factor analytic procedures 
various restrictions and technical conditions consistent with this 
theoretical position, and which rule out many kinds of factors as 
understood and accepted in the mathematical meaning of the term. 
However, the introduction of these more specific conditions adds 
considerably to our understanding of such concepts as simple struc- 
ture, factor correlation, second-order factors, etc., which, without 
them, are arbitrary rules or powerless concepts rather pointlessly 
adopted by their users. 

Granted that a factor is an influence, then simple structure takes 
on а more precise meaning, which permits us to choose among the 


"In taking this position I fi ‘thi i i 
of a single factor sis ies ua that within the circle of evidence 


ulteri i © 
factor analyses and general experim rior evidence from comparisons of several 


i 8 Н e-sequence-less) factor analysis itself there 
is embedded spatial evidence analogous to thet ina “digging” from which al 


observations. quence of cause and effect without actual time 


the error of “reifying” 2 п 
factors, Obricudy, ¢ the poii the speaker confuses the term with deifying!) 


bs es not "reify" it any more than one 
or any other scientific construct, — Weight, an electron, temperature, a gene; 


Б „а 


| 
| 


ы 


RAYMOND B. CATTELL 685 


various DV matrices. Any influence, such as scientists meet in most 
of the physical and life sciences, is unlikely to operate on more than 
a small fraction of any truly comprehensively, representatively 
chosen set of variables. This is the justification for locating a factor 
by simple structure, defined as a maximizing of the number of zeros 
in a DV matrix (Cattell, 1952), and, for decisions, it points to use of 
the V, matrix. Again, when а real influence operates with different 
power in two different populations i£ is likely to influence the same 
variables but to affect all of them proportionately more in one than 
the other. This is the justification for locating a factor by confactor 
rotation (Cattell, 1955b; Cattell & White, 1962). Since both 
simple structure and confactor resolutions thus derive from the same 
postulate of a factor as a cause, it is not surprising that they tend 
experimentally to give congruent results (Cattell & White, 1962). 

From this theoretical position on the nature of a factor it follows 
that simple structure would be expected to hold, among the various 
DV matrices, in the Vs, (or Vrs), but not in most others, and spe- 
cifically not in the Vs. For the loadings in the Vip state the extent to 
Which a factor affects a variable while the Vs says only how 
much it is correlated with it. The latter, pointed out earlier, shows 
not only how much the given factor affects the variable but also 
how much the factor is correlated with other factors which affect 
the variable. The weights in the Vre state conversely how much a 
variable affects a factor, and are clearly not what we want. On this 
stated basis the Vs, would also be preferred for interpretation (the 
Уз is “out” because the RV intercorrelations are not those of the 
actual factors when set up, e.g, as scales, and intercorrelated.) For 
the Vs, tells us which variables the factor affects powerfully (posi- 
tively or negatively) and which it leaves quite untouched. From this, 
it follows that the greatest constancy of pattern from study to study 
will obtain when simple structure is expressed in the Vip. 

So far, our conclusions agree with the position of Thurstone (who 
always sought simple structure, first, in the Vrs). But, although we 
May agree as to the choice among the traditional six DV’s it behooves 
Us to glance briefly at the possible implications of our position also 
°r new or hybrid matrices, one of which might prove still more apt 
than the V,,. We have agreed that there is no point in interpreting а 
random position which cannot be found as а recurring scientific 
entity (such would be the Vrp and the Va), or a DV profile which 


6% EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


alters with the partieular company of variables in which our main 
variables happen to occur (such would be the Vre). But at this point 
we should note that the presently preferred DV, the Ууу, also has its 
instability, namely that its values can vary with the factor company 
in which the given factor is studied. The next section will ask how 
serious this is and what further DV forms might rise superior to the 
difficulty. 


Improved Interpretation Through the Factor In fluence Matrix 
and the Dissociated-Associated Factor Contribution Matrices 


There are two senses in which other factors affect the DV matrix 
values for a given factor: (1) In the sense that if they operate on 
the variables at all, as extra sources of variance, the variance con- 
tributions (loadings) of the considered factor to its characteristic 
marker variables are reduced, and (2) In the sense that being there, 
their particular variances and intercorrelations determine these rela- 
tional values in a particular way. 

Looking at the more fundamental sense first, we realize that if 
each of the other factors in a factor matrix were taken out of the 
Pur either by (a) partialing it out statistically from both the 
f CE D, s the Variables, V's, in the DV correlations of the 
2 Eod 5 (b) holding it constant by manipulative experiments, 
d E шап would look entirely different. Then anjj signifi- 
E UK E correlation of the factor under consideration, appear- 
ЖОШ pha eae factor matrix, would in fact climb to unity. That 
Pea) eae i2 AA become +1.0 and all significant nega- 
antes 2 М ы rest would become essentially zero. (The 
only that tes x? or such clear unities and zeros to appear is 
Te a aie relation of the factor to the variable be linear.) 
1957) may have a loi, ОЗ. 24 (Cattell, 19550; Cattell 
cant loadings on this ie” bud dudas s e 
and possibly also for the Diumal Fatigue facie; A 1 2 a me 

mal Fatigue factor, Age, and possibly 


ty were held constant, and people's 
Were then plotted, we should expect 
except for errors of measurement) directly re- 
mor in which the observed variance of the latter 


a well fitting curve ( 
lating anxiety to tre 


Se 


RAYMOND B. CATTELL 687 


is fully accounted for by the variance of the independent variable 
(ror; = 1.0). 

This basic truth that a factor really either affects or does not 
affect a variable, and that the DV matrices are only complex state- 
ments of this ultimate relation, in terms of the equilibrium or out- 
come of competitive determination from many operating factors, is 
often completely lost from view in statistical discussions. Conse- 
quently, to keep it as much in view of statisticians as it is of ma- 
nipulative bivariate experimenters, or scientific psychologists gen- 
erally, let us introduce and define a seventh fundamental DV 
matrix, which we shall call the basic relation or qualitative factor 
influence matriz, Уһ. This," the result of partialing out or manipula- 
tive experiment, will have nothing but unities (plus or minus) and 
zeros in it, and will indicate whether the factor ultimately affects 
positively, negatively or not at all, individual members of any array 
of variables we like to name. The zeros will be in the same positions 
as they occur in the simple structure Vip (or Vrs) matrix, and the 
only unusual feature which the matrix user will have to watch is 
that (as illustrated in Table 3) there can here be more than one 
unity in one row or one column, i.e., communality as a concept not 
applicable to the Уң matrix, since the factors are not considered as 
operating simultaneously. 

The factor influence matrix, Ул, is thus to be regarded as a very 


Table 3 
The Factor Influence or Factor Mandate Matriz, Va 
VARIABLES FACTORS 
Ё, Fa F k 
Vi 1 0 1 
Ve 0 -1 1 
Vs -1 0 0 
Va- 0 1 0 
LÉ— С — 
ee 


„7 For clarity I would personally prefer factor mandate to factor иш 
ince the latter suggests degrees and the former the required absolute ac eit 

River the current disinclination toward terms outside basic English c 
tains me to what is probably the easier, more adoptable term. 


688 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


special kind of transformation from the Vip. (Incidentally, there 
exists a reciprocal matrix to this Vre, wherein one indicates simply 
whether a variable contributes to all to a factor score. This eighth 
matrix, which would not be “causal” can be symbolized V4.) The 
Va is one which J. S. Mill might have written, had he been addicted 
to matrices, to express the procedure for inferring the nature of any- 
thing by applying his principles of concomitant variation. Certainly 
it is the matrix from which we should seek to infer the nature of a 
given factor, for it shows, (a) what variables the factor influences 
positively, negatively, or not at all, and (b) what other factors 
share the property of having such influence on the given variables. 

One wonders, however, whether a DV matrix might not be found 
which expresses the peculiar and particular action of a given factor, 
uncontaminated by other factors, without going to the extreme of 


е other hand, in researches which re- 
f variables (Appendix 2, in (Cattell, 


cultural situation. 


s or Contribution Matriz, ү This, which 
stands at virtually the opposite extreme from the Ум, is neverthe- 


| 


RAYMOND B. CATTELL ‚ 689 


less the only опе of several possible alternatives which offers logi- 
cally an equal completeness of separation. Whereas the Vate says 
the nature of the factor is to be inferred from what disappears when 
it is wiped out, and includes thus both its direct effects and those 
through other factors, the logic here is that the factor is to be un- 
derstood by what it does to variables entirely on its own. Of course, 
there is an interpretation of this verbal logic which would lead one 
back to the Ул, with its ones and zeros. But here we wish to interpret 
in a sense parallel to the Var, namely, by asking what fraction of 
the variance of each variable (when the rest is accounted for by 
the normally present factors) is due to the direct and single in- 
fluence of this factor. Parenthetically, it is helpful to consider the 
Vase and Va, also in relation to the Cattell-Adcock generalization 
(Cattell & Adcock, 1962) of the Schmidt-Leiman transformation. 
The former states the loadings of the variables directly on obligue 
second order and third order factors, the latter on orthogonal second 
orders. It is this higher order variance which is cut out in the Vayo 
and added in the Vaye. 

At first one might think to accomplish this by a “part correlation” 
(Cattell, 1955a), taking the other factors’ variance out of this one, 
but not out of the variables, but this (subject to a lower bound 
choice of communalities) actually produces? the Vrs. What we really 
want, conceptually, in the Var, is the action of the given factor in 
dissociation from that of the higher order factors in which it is 
mounted, and which are given by the interfactor correlations. Sta- 
tistically, this is the loading (correlation here) of the factor on the 
variables when higher orders are partialed from the factor and from 
the variables. 

As far as the less radical separation implied by the Vate i8 con- 
cerned, it is at least easy to calculate, The writer is indebted to Re- 
search Assistant Owen White for showing that it corresponds to one 
of the fifteen hybrids above, in fact as the product of the correspond- 
ing terms in the V,, and Vz matrices (a geometric mean of the two 


| — ot al to ar- 
,"In this field of high interdependence of formulae it is not unus 

nve unexpectedly by the ted on some concept long known from the front 
or. The present identity can be shown without formulae, for the Ma teles i 

Senting the “parted out” factor is by definition that factor projected upon 

in orthogonal to all other factors. But what we call the 

А80 а unit length vector orthogonal to its hyperplane and 

er factors except its own associated factor. 


600 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


to be exact). These figures show how much of the variance of each 
variable is simultaneously associated both directly with the variance 
of the factor, and indirectly with the latter's covariance with other 
factors. The general properties of this Уш, are more fully discussed 
elsewhere (White, 1962), but obviously, as just stated, it describes 
how much of the variance of the variables would disappear with the 
total removal of that factor from the system. 

The Vare and Var are illustrated for the present example in Table 


a 


in introducing the “hybrids,” has an “emptier” look, with decidedly 

more near-zeros, due to adding the zeros in the Vrs and Vg, but we 

have already cautioned against regarding this as really meaning a 
TABLE 4 


Unique Characteristics of Factors, (in DV Profiles), Shown in 
Associated and Dissociated Factor Contribution Matrices 


expressions in Table 1, First one may note that the V, te, as stated 


a ОТ, 1 U.I. 32 U.I. 24 
yrs Variable Intelligence — Introversion Anxiety 


ate" Vato Vato Van, Vato Vato 


20 Numerical Performance +.248 
erical "549 F500 +.000 +.000 -+.005 —.072 
31 Classification 7266 +.520 +.002 4.050 1.010 — 092 


32 Short Time Memory +.134 4.360 -+006 —.080 +.004 -++.033 
И ne ae 


^ 16y Seale +000 +030 +186 +440 +4.019 —.111 
(H—) Seale FOOL +.021 4.299 +.444 +.134 -++.229 


44 А 
Fluency; Dreams/Total —:000 —.000 +00 +.030 +-.006 +.046 
99 Annoyability 


16 ea ness to confess +002 +.021 — 99i —011 +.281 +8378 
railties 

55 Tend +.005 +-.053 7001 —.032 +-.184 +.291 
endency to Agree +.001 —.050 .L.002 4-030 +.059 +.157 


loading lation having opposite Ad tive, 88 in the Instance wo cited above of 


I " А negativo variance contribution would give us 
and covariance, the latter being sometimes етой bat these figures are combinations of variance 


fave the unusual property of being 
и th У property 
ciated with single Рта А, combinations of the variable variance asso- 
(sometimes connected with an ne {always positive) with factor covariances 
calculations, may o twei des correlation), and the latter, in rounding 
» Шау outweigh the former with regard to а given Variable 


RAYMOND B. CATTELL 691 


better simple structure. However, an argument could be made, on 
the meaning of “associated factor contribution" for a simple struc- 
ture defined as maximizing the number of such entries. Secondly, it 
will be noted that for interpretation purposes the Vate has the dis- 
advantage that its significantly large entries are all positive, i.e., it 
does not tell us in which direction the variable and this factor “go 
together.” 

The Var, here manifests no startling differences, but keeps close to 
the Уш and Vss. Only to the extent that the higher order factors con- 
tribute to a variable very differently from the factor under examin- 
ation will the Vare and Vr diverge. For example, the Vate lowers the 
relations of desurgency (F—), relative to threctia (H—), in anxiety 
(О.Т. 24) more than do the Vy. or Vy, matrices. It also more clearly 
drops the ability variables (numerical and classification perform- 
ances) into the hyperplane of the anxiety factor. As to the first, 
psychologically one observes that although anxiety is often correl- 
ated with depression (desurgency) it is distinct from it, whereas the 
definition of threctia (H—, response magnitude to threat) is con- 
sistent with the notion that anxiety is a necessary component 
therein. If the Vare should prove to continue to discriminate simi- 
larly, with better regard for independent psychological evidence, in 
à number of known psychological instances, our theoretical expecta- 
tion that it offers the best basis for interpretation of the pure factor 
might be interestingly confirmed. 


The Best Use of DV Matrices for Interpretation 


The last statement above, concerning the Vare being the “best!” 
matrix, needs qualifying by the reminders that, (1) the Vn is in - 
another sense the best, (2) maximum interpretation comes from 
Comparing the several matrices, and, (3) various conditions now to 
be discussed must hold. What we are initially saying is only end =: 
logical grounds, we would expect simple structure to (Өү? 
constancy of loading pattern in the Vare, and that the loading pattern 
Would most truly represent the relative influence of the factor per 
se upon the variables, i 

In so far as this Var, pattern fails of constancy опе would expect it 
to do so only through inconstancy in the following conditions: 


(1) Imperfect, locating of the simple structure inherently present. 


692 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(2) Changing experimental reliabilities of the variables across 
studies. 

(3) Departure from linearity of the factor-variable relation. 

(4) Changing real variances of the factor considered, or other 
factors, across studies, 


The last is stated in terms of factors, (instead of variables, as in 
"Thurstone's treatment) since we assume a single variable would not 
change its real variance except through one or more factors doing 
so. Nothing can be done about (3) above, but the others imply that 
for factor identification and interpretation to be pursued depend- 
ably, e.g., by Vs, or Vare, one should (a) work with loadings corrected 
for attenuation for unreliability, (b) correct for the effects of differ- 
ences in real factor variance across samples or populations, and, 
(c) take more care to ensure that the rotation position at which one 
stops really represents the greatest; possible maximization of hyper- 
plane count. Of course, (b) supposes that the factor in question al- 
Ways possesses the same entourage of other factors in the different 
experiments, and that these account for all variance of all variables. 
An approach to this is made in very large and well sampled collec- 
tions of variables, where that familiar confession of ignorance—the 


fication and interpretation studies be designed with marker vari- 
ables for all known important factors. 
When one considers how ina, 


heeded—at least in factor studies up to five years ago—and how 


matrices are used, would nevertheles 


(corresponding to the Vy matrix), 
et al., in press) with тер ү 

: RÀ replicated personality meas- 
ur : 

No rtis da Support to this conclusion. This conclusion also 
ie a pre erence, when the above ideal conditions cannot be 
pe or using the salient variable similarity index, в, (cutting into 
No categories of significant” and “hyperplane” loadings) for fac- 


RAYMOND B. CATTELL р 693 


tor matching. However, growing recognition of this problem may 
lead to future design of experiments containing the necessary in- 
formation for these more reliable matching and interpretation de- 
cisions. 


Summary 


l. Identification (matching) and interpretation (inference as to 
the underlying influence) of a factor take place after a unique ro- 
tational resolution, Alternative ways of expressing a factor, as de- 
veloped here, affect both the resolution adopted and the ensuing 
matching and interpretation decisions. 

2. The most dependable decisions require consideration of six 
sources of information: (i) The nature of the dimension-variable 
(DV) relation profiles. (ii) The correlations of the entity with other 
known, landmark factors. (iii) Its size or “importance” (Mean total 
variance contribution in a defined domain of variables). (iv) Iden- 
tification by elimination, i.e. considering other possible matches, in 
matrices planned to give complete series. (v) Study of modifications 
producible in (i), (ii) and (iii) by experimental conditions, samples 
and designs, e.g., comparing R- and P-techniques. (vi) The be- 
havior of the total factor score in manipulative and survey experi- 
ments. 

3. This article enlarges only on the first and begins by recogniz- 
ing that there are (including hybrids) some fifty possible matrices 
expressing dimension-variable relationships. At most six are tradi- 
tionally recognized in textbooks, and only one—the reference vector 
structure, V,,—is all too commonly offered in research articles as the 
basis for interpretation and identification. ү 

4. The derivations, transformations and meanings of the six main 
DV matrices have been set out and have been illustrated by values 
from an objective test factorization on в sample of 500, with respect 
to the source traits extraversion, intelligence and anxiety. Mutually 
transformable, all are equally mathematically correct. But some 
lead to different simple structure or confactor rotations from others, 
Ju all make different kinds of statements about what the dimension 

оез, 

5. Since different DV matrices add different kinds of information, 
all are useful in contributing to interpretation, but a "best" matrix 
8 conceivable in that one may better define a unique resolution and 


694 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


more directly indicate the nature of the underlying scientific influ- 
ence than another. To make a decision possible one must go beyond 
mathematico-statistical definitions of a factor and define it as a 
scientifie influence, such that the variables are dependent and it is 
the independent variable. Only on this basis can simple structure 
and confactor resolutions logically be adopted. The existing DV 
matrix concept best for identification and interpretation then be- 
comes the Vr. 
6. Pursuing the ideal of a matrix that will express the essence of 
a particular factor in maximum abstraction from disturbing, tem- 
porary or inessential conditions, one arrives at three new matrices, 
the Vate, Vare and Vg. The associated factor contribution matrix, 
Vate, asks what is the contribution of the factor to the variable vari- 
ance simultaneously through direct influence and the covariance 
with its normal entourage of factors, i.e., its total effect in the cir- 
cumstances: The dissociated factor contribution matrix, Var, de- 
termines what is peculiar to the factor alone, by partialing the 
higher order factors from the factor and the variables. The factor 
influence matrix, Vn, goes the whole way and partials out from fac- 
tor aad variables all variance from all other factors. It leaves 1% 
E s showing what variables the factor, by its nature, is capable 
9 Ar uencing at all, and corresponds to what would be found from 
a arge series of completely controlled manipulative bivariate ex- 
periments, 
Muri and Vate are presented for the same three personality 
the DY "o aversion, intelligence, anxiety às were illustrated in 
ipis i Fei at present known to psychologists. The Vate has an 
This is "s ue appearance by virtue of more hyperplane zeros. 
Sent cda A prove meretricious, and has the disadvantage for 
ote ice. ees that one loses the sign showing the direction 
"c action on the various dependent variables. The Vate 
ашу close to the Vip, but where it differs 3 better 
with the hologi . lilers, seems to agree be 
psychological understand Р 
ГИРУ. Ing of the meaning of the factor. 
Sly, research needs to be directed, both i isti 
n ; 1 both in statistical theory 
and in consideration of psychologi 4 d 
bvalunti ological, substantive results, towar 
uatıng the Vite (and perhaps the V, ) 3 sing bet- 
ter simple structure determination possibly showing b 
Tmination, greater matching constancy 


across researches, a x P» д 
forms. » and better interpretability than the older DV 


RAYMOND B. CATTELL 695 


8. Since all DV matrices, except Va, are the complex expressions 
of many influences, one should not expect the DV profile to be 
identical tor the same factor, across two experiments, as some factor 
matchers have assumed. The Ve, or with more time, ће Vare, is 
recommended for resolution, matching and interpretation, but with 
prior correction for test unreliability and with post-rotation, itera- 
tive correction for changing factor variances, as well as experimental 
designs for keeping a standard and comprehensive company of 
associated factors in every matrix. The agreement is then better 


‚ measured by the (oblique) pattern similarity coefficient, rj, (or, in 


uncorrected data, by the saliant variable similarity index, s) than 
by r. 

These formulations have arisen from praetieal discussions on the 
matching and interpretation of the twenty personality factors in 
objective tests, after fifteen years of research and five to twenty 
experimental replications (Cattell, 1955е; 1957). The writer is in- 
debted to the reactions of those involved in these researches, notably 
O. White, F. Damarin, J. Hundleby, K. Pawlik, H. Scheier, and F. 
Warburton, as well as to К. Dickman and Н. Kaiser, none of whom, 
however, should be considered responsible for defense of the con- 
cepts stated. 


REFERENCES 


Ahmavaara, Y. "Transformation Analysis of Factoral Data." An- 
nales Academiae Scientiarum Fennicae, Series B, Vol. 88. 2, 
Helsinki, 1954. aan 

Burt, C. L. Factors of the Mind. London: University of London 
Press, 1940, ў 

attell, R. B. The Description and Measurement of Personality. 
New York: World Book Company, 1946. EM. 

Cattell, R. B. “r, and Other Coefficients of Pattern Similarity. 
Psychometrika, XIV (1949), 279-298. 

Cattell, R. B. Factor Analysis. New York: Harper & Brothers, 1952. 

Cattell, R. B. “Psychiatrie Sereening of Flying Personnel: Personal- 
ity Structure in Objective Tests.” USAF School of Aviation 

‚ Medicine, Report No. 9, Randolph Field, Texas, June, 1955. (a) 

Cattell, В. B. “Factor Rotation for Proportional Profiles: Analytical 
Solution and an Example.” British Journal of Psychology, Sta- 
tistical Section, VIII (1955), 83-92, (b) Р 

Cattell, R. B. The Objective Analytic Personality Factor Battery. 
Institute of Personality and Ability Testing, 1602 Coronado Dr., 

,, Champaign, Ill., 1955. (c) 

Cattell, В. B. Personality and Motivation Structure and Measure- 
ment. New York: World Book Company, 1957. 


69% EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Cattell, R. B. “Experiments on Sources of Perturbation in the Factor 
Analytic Resolution of Traits.” Advance Publication No. 13, 
Laboratory of Personality Assessment, University of Illinois, 
Urbana, November, 1960. fi , 

Cattell, R. B. and Adcock, C. J. "Computing the Loadings of Vari- 
ables on Higher Order Oblique Simple Structure Factors: Theory 
and Examples.” Advance Publication No. 18, Laboratory of 
Personality Assessment, University of Illinois, Urbana, Illinois, 

962. 

Cattell, R. B. and Baggaley, A. В. “The Salient Variable Similarity 
Index—s—for Factor Matching.” British Journal of Statistical 
Psychology, XIII (1960), 315-327. 


Cattell, R. B., Beloff, H., and Coan, В. W. The High School Per- * 


sonality Questionnaire Handbook. Institute of Personality and 
Ability Testing, 1602 Coronado Dr., Champaign, Ill., 1958. 

Cattell, R. B. and Dickman, К. “A Dynamic Model of Physical In- 
fluences Demonstrating the Necessity of Oblique Simple Struc- 
ture,” Psychological Bulletin, in press. 

Cattell, В. B. and Meschieri, L. “The International Cross-Cultural 
Constancy of Personality Factors Examined on the 16 P. F. 
Test: I. American-Italian Relations.” Advance Publication No. 
12, Laboratory of Personality Assessment, University of Illinois, 

0. 


Cattell, В. В. and White, О. "The Method of Confaetor Resolution 
Illustrated for the Oblique Case on the Cups of Coffee Model." 
Advance Publication No, 17, Laboratory of Personality Assess- 

P ment, University of Illinois, Urbana, Illinois, 1962. 

uBois, P. H. Multivariate Correlational Analysis. New York: 
Harper & Brothers, 1957, 


ruchter, B. Introd ti Н ті 

Е Молли uenon to Factor Analysis. New York: D. Van 
aggard, Е. A. Intraclas © 7 "TA 7 

2 Ner York: Dryden т and the Analysis of Variance. 
rman, H. H. ; : "TOREM. 

i Chi pee n Factor Analysis. Chicago: University of 
undleby, J. D., Pawlik, K., and Cattell, R. В. The First Twenty, 

: mensions in Objective, Behaviora 

Measurements, m: University of Коо Press, in press. 

i Unpublish ed manuscript Орша Interpretation of Factors. 
e падот Dale of Memory Ability.” Unpub- 

Pete Ў ар. n University, 1954, 
rived toch eee Age-Generality of Personality Factors De- 


gs.” Ерпс. а - "EE 
MENT, XX (1960), 461-404. MAL AND PSYCHOLOGICAL MEASURE 


Thurstone, L. L, 7 , А 4 : 
T et Preis ТЫ 0 Factor Analysis. Chicago: University of 
ucker, L. R. “A Method for Synthes; as ТЕ. 
P 1 ) esis of Factor Analysis Studies. 
пени esearch Section, Report No. 984, Department of the 


[ 


RAYMOND B. CATTELL 697 
Wherry, R. J. "The Past and Future of Criterion Evaluation." Per- 


sonnel Psychology, X. (1957), 1—5. 
White, O. “Some Properties of Factor Contribution Matrices." Un- 


published manuscript, 1962. 
Wright, S. "The Interpretation of Multivariate Systems.” In O. 


Kempthorne, Statistical Mathematics in Biology. Ames, Iowa: 


Iowa State College Press, 1955. 
Wrigley, C. F., e£ al. “Use of the Square Root Method to Identify 


Factors.” Psychological Monographs, LXX (1956), No. 23. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


ISOLATION OF ELEVATION AND SCATTER 
COMPONENTS IN PERSONALITY AND 
ATTITUDE QUESTIONNAIRES! 


RICHARD Q. BELL 


Child Research Branch 
National Institute of Mental Health 


A number of authors have been able to demonstrate that certain 
subjects have a tendency to endorse a wide variety of personality 
and attitude questionnaire items, including logically contradictory 
ones (Bass, 1956; Chapman, 1957; Cohn, 1953; Jackson, 1957). 
This has been used to throw doubt on a literal interpretation of the 
Meaning of high agreement scores on such questionnaires as the F 
scale. Cronbach (1946, 1950) and Berg (1955) have reviewed studies 
indicating that this tendency to agree, or acquiescence set, is com- 
mon to a large number of personality and attitude questionnaires in- 
volving relatively ambiguous items and using fixed categories of re- 
sponse such as “agree,” “disagree,” “like,” “dislike,” and “true,” 
“false.” Fiske (1957a, b) has reported an extensive series of studies 
demonstrating individual differences in variability in a number of 
Psychological test situations including the fixed category response 
type mentioned. “Carefulness” or “evasiveness” appears to be one 
manifestation of low variability (Fiske, 1957a, p. 463). In summary, 
in a variety of testing situations differences in both acquiescence and 
variability can be demonstrated between subjects under circum- 
stances where test content is held constant. 

The findings on acquiescence offer a possible explanation for the 
frequent inability of test developers to produce scales with low in- 


тъ auth 
! The author is indebted to Irwin A. Berg, Lee J. Cronbach, Donald W. = 
ка Bart G. Osburn for helpful suggestions and criticisms of an earlier draft o 
8 paper, 


699 


70 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tercorrelations. The findings on variability indicate that subjects 
select, extreme response options because of general response tenden- 
cies as well as reactions to specific item content. Both findings point 
out the need for scale development procedures which isolate such 
general tendencies from determinants of response which are more 
specific to item content. Whether a general tendency of response 
represents acquiescence, variability, a preference for socially de- 
sirable items, or a reaction to a content factor common to all items 
ina questionnaire, the purpose of specific scale development is best 
served by procedures which isolate these effects. The question of 
partialing out different components of these general tendencies for 
their own theoretical or Predictive purpose is another problem. 
This paper describes a procedure which makes it possible under 
certain circumstances to isolate two general response tendencies in 


most questionnaires, One of these will be referred to as elevation, 
the other as scatter. 


HTS 


9— Subject д 


oe 
0-2-0 4 CP Subject В 


977-9 B corrected 


+ + 
о D P 


CORRECTED RESPONSE WEIG 
Uu 
m 


| Strongly 
Disagree 


3 4 5 6 
QUESTIONNAIRE ITEMS 


and uncorrected А P { 
fering уез оппаїте responses for two subjects 
differing in elevation, б 


SLHOISM в S3lH0931V5 3530453: 


` CORRECTED RESPONSE WEIGHTS 


t Figure 2. Corrected and uncorrected questionnaire response 


RICHARD Q. BELL 701 


“ance specific to each item, as well as variance associated with gen- 


eral response tendencies. Both subjects A and B have responded in 
the category Strongly Agree on item 3. However, it is apparent that 
in the case of subject A this response represents a modal response, 
judged from his responses to the other items, while in the case of 
subject B the response to item 3 represents a considerable departure 
from the level of agreement or disagreement shown on most of the 
other items. Both subjects show similar scatter about their own 
level. In a similar protocol for subjects C and D in Figure 2, it can 
be seen that, though both show the same general level of response to 
all the items, they differ in scatter; thus the fact that both have re- 
sponded to item 8 in the category Strongly Disagree can hardly be 
considered a similar response when judged against the background of 
their other responses. 

Subject D’s deviation in the case of item 8 is much greater than 
his other deviations from his own level, whereas subject C’s devia- 
tion from his level is no different from his deviations shown on all 
the other items. The implication can be drawn that the interpreta- 
tion and statistical handling of the meaning of these responses to 
the specific items must take into consideration both the subject’s 
general elevation and scatter. The method advocated is an applica- 


— tion of Cronbach and Gleser’s analysis of profile components into 


elevation, scatter, and shape (1953, pp. 459-461). This procedure 
consists in this instance of obtaining a measure of the shape of the 


л 

m 

[^] 

D 

0—0 Subject С e—e Subject 0 2 

3 Р a 

1 0---0 Ccorrected e-o D corrected Strongly M 

2.5 ا‎ с R TA 

gree су 

+1.5 d E 

m 

+5 5 9 

$ S 

-5 y m 

"s dei n 

-2.5 ? = 
? Strongly 

3.5 : Disagree B 

1 2 3 4 5 7 p d 


6 
QUESTIONNAIRE ITEMS 
s for two subjects 


differing in scatter. 


702 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


subject/s item response profile by standardizing the responses given 
by the subject. That is, the mean of the subject's response weights 
for all items is a measure of elevation, the standard deviation of 
the subject’s response weights to all items is a measure of scatter, 
and the shape component is a result of subtracting the mean from 
each response weight and dividing by the standard deviation, 

When these operations are carried out on the responses of the 
four subjects shown in Figures 1 and 2, a rather different protocol 
emerges, as shown by dashed lines in both figures. Subjects A and B 
are not represented as having made an equivalent response to item 
3, and subjects C and D сап be differentiated on the basis of item 8. 
Information lost when the subj ect's elevation and scatter are ignored 
in interpreting responses to items 3 and 8 is salvaged when the re- 
Sponses are standardized using the subject’s own mean and standard 
deviation. The mean and standard deviation must also be retained as 
measures of the subject’s performance along with the corrected pro- 
file of responses to the items, 

Thus the protocol for subject A now ineludes ten elements, eight 
of these being Tesponses to the items, each response being expressed 
pi poe mean and standard deviation, the other two con- 

an and standard deviation used to represent the 


subject's elevation and scatter and considered as items of informa- 
tion potentially usefu] in themselves, T 


ories are not Scalable for mos » items 

i i nost of the items 

1n à questionnaire, the derivat; 5 à; 
ТЫ the derivation of à mean and standard deviation 


1 f item re, : š RPR Ў 
if the category «?» ordi ur weights is not justified. Thus, 


iti i i ms F 
to the three categori 8 not possible to assign quantitative weights 


RICHARD Q. BELL 708 


reason, where scalability of the response categories has not been 
established in the standardization of a questionnaire, i& may be ad- 
visable to plot regressions of item response frequencies against total 
scores for scales or against other criteria to detect possible curvi- 
linearity and other obvious departures from linearity. On a priori 
grounds alone, response categories which do not use systematically 
graded adjectives, or which use such intermediate categories as 
“undecided” or “?”, should be considered suspect until tested. 


Constraint on Variability 


It is mandatory that the questionnaire be set up so that elevation 
and scatter are well measured; otherwise the additional computa- 
tions required are pointless and the possible validity which may be 
extracted from the components themselves is attenuated. In the case 
of both elevation and scatter, adequate measurement requires that 
the number of fixed response categories be such as to permit the sub- 
ject to fully manifest the components. Fiske (1957b, p. 321) has 
provided a formula for estimating the constraint on variability im- 
posed by the number of response categories. Variability shown in 
a single session must be distinguished from variability shown be- 
tween two or more sessions, but both types will be affected by the 
number of response alternatives available, Although the general 
format need not be changed in tests using such categories as “true,” 
“false,” it is readily apparent that intermediate response options 
Would have to be supplied. Tests with even three response categories 
clearly do not provide an adequate basis for computing a standard 
deviation of the sul »ject’s responses to the individual items. 


Application to Single Scales 


The larger the number of items in a questionnaire the more stable 
the elevation and scatter components will be. This consideration 
favors application to multidimensional questionnaires. There is also 
some question of whether the components should be extracted from 
а single scale, In such а case there is a risk that the elevation factor 
might reflect, the subject’s reaction to specific scale content more 
than a general response tendency. A scatter score might also simply 
be hegatively correlated with elevation unless the mean elevation 
Was at the mid-point of the scale. If elevation scores were displaced 
toward the upper end of the scale, high scores would automatically 


704 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


involve low scatter. In the case of a single bipolar scale in which: 
high score is obtained by selecting opposite response options fo 
oppositely worded items, a high score on scale content would be 
associated with a high scatter score. Selecting extreme response op- 
tions so as to respond appropriately to the content would result in 
high scatter, For these reasons it seems that elevation and scatter 

scores are best isolated in a multidimensional instrument, since in 
such a case there is less likelihood that a subject/s reaction to spe- 


cific item content would be reflected heavily in the general factors 
isolated. 


Illustrative Applications 
Effect on Relations Between Items 


Table 1 provides an example of the effect on a set of scales of iso- | 
lating elevation and scatter components. The questionnaire used in 
this example, as well as in most other methodological studies reported 
in this paper, is the Parenta] Attitude Research Instrument (PARI) 


(Sehaefer & Bell, 1958). This instrument uses fixed categories of re- 


ed the categories “strongly agree” through 
to “strongly disagree.” The examples provi 


RICHARD Q. BELL 705 


Constants obtained from the complete 115-item questionnaire were 
thus applied to responses for only the 25 items. Two sets of product- 
moment correlations were computed between the 25 items, one based 
on the uncorrected weights, the other based on the weights corrected 
as indicated above. Table 1 summarizes these item intercorrelations, 
showing the mean of the ten correlations between the five items 
within a scale and the mean of the 25 correlations between the five 
items constituting one scale and the five items constituting another 
scale. - 

Following the correction, Scale 1 showed an increase in the cor- 
relation between its own items while correlations of its items with 
other scales did not show a corresponding increase. Scale 4, and, to a 
certain extent, Scales 7 and 10 showed a decrease in homogeneity 
such that they tended to disappear as scales, 

The effect on sealing of isolating elevation and scatter may also 
be seen in changes in internal consistency coefficients computed with 
Kuder-Richardson Formula 20. The internal consistency coefficient 
for each seale based on corrected response weights is listed following 
that for uncorrected weights, the scale number being given for 
readers interested in a content analysis of the particular scales: 
Scale 1, .45 — .68; Scale 4, .68— 71; Scale 7, .60 — .40; Scale 9, .54 — 
56; Scale 10, .74 — 54. With the exception of Scale 4, results appear 


TABLE 1 


Change in Mean Item Intercorrelations Within and Between Scales 
Following Correction for Elevation and 


Seale Number 
E— .  - dd4:Á . $4jeeNunber NN 


1 4 vá 9 10 
Seale 1 
Uncorrected 15 08 09 14 19 
Corrected 31 —10 07 20 —19 
Scale 4 
Uncorrected 27 23 12 28 
Corrected 07 07 01 03 
Scale 7 
Uncorrected 24 18 00 
Corrected 13 10 —16 
Scale 9 
Uncorrected 21 —02 
Corrected 20 —15 
Scale 10 
Uncorrected 36 


Corrected 19 
a OCOD . 010519 ee 


706 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


consistent with those shown in Table 1. Seale 1 has increased in 
internal consistency. Seales 7 and 10 have decreased. Scale 9 has re- 
mained essentially unchanged. These data illustrate alterations in 
scale strueture which may result from application of the procedure. 


Effect on Test-Retest Stability 


Test-retest correlations with and without the same isolation pro- 
cedure are available from another study using a sample of 60 un- 
married student nurses administered a PARI questionnaire twice. 
with an intervening period of three months. Certain scales used in 
the standard instrument were omitted and three items previously 
tested for homogeneity were added to each scale, but the question- 
naire was in other respects the same. Six eight-item scales were 
selected from the 24-scale questionnaire for intensive analysis. The 
test-retest correlation coefficients for the scales based on corrected 
response weights are listed following the coefficients based on un- 

` corrected response weights, the scale number again being given for 
readers interested in content analysis: Scale 1, .47 — .57; Scale 5, 
18 — .64; Seale 7, .64 — 53; Seale 10, .72 — 42; Scale 18, .51 — .46; 
Scale 21, .62 — 63. The test-retest, coefficients for elevation and scat- 
ter were .87 and .55, respectively. 

Scale 1 from the study of internal consistency and scales 1 and 21 
from the study of test-retest correlations are scales containing items 
и high popularity values. Such scales either gained or remained 
the same in internal consistency or test-retest reliability when sub- 
jected to the isolation procedure, "The other scales, which are more 

| oan bras questionnaire as a whole (Schaefer & Bell, 1958), show 
se effects in internal consistency when subjected to the pro- . 
cedure but generally lower test-retest reliability. 


Effect on Validity 


Two studies were loca 
questionnaire and whic 


RICHARD Q. BELL 707 


In the first study to be discussed there were no differences in 
elevation and scatter between the two criterion groups. Data used 
were from a study by Mann (1957) in which 23 five-item scales from 
PARI were administered to two criterion groups, one consisting of 
66 mothers of children with cerebral palsy, the other being a closely 
‘matched group of 66 mothers of normal children, Elevation and 
scatter measures were obtained from the 23-scale scores for each 
mother in each group. The 23-item profile was then corrected for 
each subject using the subject’s own elevation and scatter measure. 
Using the 5 per cent level of significance and a two-tailed test as a 
basis for comparison, it was found that only two scales differentiated 
the criterion groups using the uncorrected scores while four out of 
the 23 scales differentiated using the response set scoring system. 

+ In the second study (Zuckerman, 1960) ,.a difference significant 
beyond the 5 per cent level was found between two criterion groups 
on elevation but no difference on scatter. The two criterion groups 
| being contrasted were a group of 162 mothers of children admitted 
to child guidance clinics, and a group of 181 mothers sampled from + 
the general population. In this situation, isolation of elevation and 
Scatter resulted in a decrease of scales significant at or beyond the 
5 per cent level from four, when the uncorrected scores were used, to 
_ three when the corrected scores were used. At or beyond the 1 per 
cent level of significance there was a reduction from three scales 
Significant with uncorrected scores, to only one significant with cor- 
tected scores, It should be noted in Mann’s data that, if the individ- 
ual items in each scale showed the same effects as the scales, cor- 
Tection for response sets would yield an increase from ten to twenty — 
Significant, items, On the same assumption, Zuckerman's data would 
| E" а decrease from twenty to fifteen items following the correc- 
ion, 

Data could not be located in which the number of valid seales 
"as sufficient that large differences in number of differentiating 
Scales would be possible before and after the application of the pro- 
cedure. The above findings are only illustrative of effects which 
Might be obtained in validity studies. Ordinarily the isolation pro- 
tedure would be applied at the item level. Analysis of variance and 
other Statistical procedures (Webster, 1958) could be used if the 
nly object were the identification of variance associated with the 
elevation factor in making contrasts between groups. 


708 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Discussion 
The fact that some PARI scales increased and others decreased 
in internal consistency following isolation of elevation and scatter 
is predictable in large part merely by considering the role of eleva- 
tion. If we assume that the extraction of elevation from a multi- 
seale instrument will have an effect like that of removing the first 
centroid factor, the resulting item relations will resemble first resid- 
uals. The elevation measure may contain group factor variance, but 
its extraction will not eliminate this source of variance or change 
the arrangement of group factors. Where the scale increases in in- 
ternal consistency, it is most likely that the saturation of the items 
with the variance specific to the scale has been increased by removal 
of the components reflected in the elevation measure. Where a scale 
decreases in internal consistency following extraction of elevation, 
it is likely that the decrease is proportionate to the previous contri- 
bution of the first centroid factor to that scale. In either event, & 
more precise picture is available of the factors contributing to the 
relations between items. On the one hand, it is less likely that the 
Bernal consistency of a scale will be underestimated, as it would 
be prior to correction if the items measured elevation in differing 
amounts. On the other hand, it is possible to see that items within 
some scales have little or no internal consistency outside of that 
associated with the elevation measure. 
The above discussion has assumed that elevation reflects some 
stable factor or factors operating in response to the questionnaire 
and por temporary elements specific to a single testing session. This 
seems justified in the case of PARI, The elevation measure showed 
Minoris reliability of .87, the scatter measure .55. The value for 
ability Egon with those usually obtained in studies of vari- 
m КЫ bro of Toaponee categories in this instance re 
On oes n a scatter. This would lead to lower reliability. 
wit die cloned пиа range introduces а relationship 
bai dos bD re so that part of the stability may hav? 
in bey Mein a A set is one source of the mis 
regarded as temporary thes ү е» aro pede 1 
EK Bong adl ы VR 7 o At the request of the autho 3 
TER dency to check the category “like 1 
ons of the Vocational Interest Blank separated bY 


OCOD ooo ا‎ LU T. at UE a NR 


RICHARD Q. BELL 709 


an interval of 19 years. The test-retest correlations were as follows 
for the total and various parts of the questionnaire which used the 
“like,” "indifferent," and “dislike” categories: Part I, .55; Part II, 
43; Part ПІ, .51; Part IV, .35; Total, .59. Some of these correlations 
reflect considerable stability for this response tendency when com- 
pared with the stability of content scores reported for a number of 
questionnaires in the study by Kelly (1955). 

In cases where elevation and scatter are primarily determined by 
factors which are stable over time, their isolation would result in 
lower test-retest reliability. This would still be a beneficial result 
since the new reliability estimates for the items or scales would be a 
more accurate indication of the stability of their specific variance. 
The stability of the elevation and scatter measures can be appraised 
separately. Since the elevation measure and, to a certain extent, the 
scatter measure showed stability over time in the data reported, it 
Seems likely that the lower test-retest reliability for the four most 
Tepresentative PARI scales following correction was a function of 
Temoving these stable components. Correspondingly, it seems likely 
that test-retest reliability has been overestimated in other scales in 
other similar instruments for the same reason. Considering the item 
relations after isolation of elevation as first residuals would also lead 
us to expect very low reliabilities, since such residuals are usually 
less reliable than the first centroid. Item reliabilities are low in most 
Personality and attitude questionnaires. After isolation they would 
be still lower and much larger scales might be needed to reach ac- 
ceptable levels of stability. 

The possibility should not be overlooked that temporary factors 
may also contribute to the elevation and scatter measures. As an 
example of the effect of temporary factors, a subject may in а single 
testing session decide to “go along with” the questionnaire, agreeing 
at а higher level than customary. In a particular testing session the 
Subject, might also elect to take more extreme stands on the issues 
volved and thus have a higher scatter score than on other oc- 
Casions, Under these circumstances an improvement in test-retest 
reliability would result from isolating elevation and scatter. 

_Even if further analysis is not carried out to determine the rela- 
tive contribution of temporary or stable elements to the elevation 
and scatter measures, an increase in precision of estimates of in- 
ternal consistency and test-retest reliability will be afforded by the 


710 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


isolation procedure. The isolation would only be inappropriate in 
the improbable situation where a questionnaire consisted of test 
items which exclusively and without error measured separate specific 
sources of variance. Even in this situation the basic data would not 
be affected materially since differences between subjects on eleva- 
tion and seatter would be minimal. The computations would merely 
be a waste of time. 

Effects of the procedure on validity are predictable from the same 
proposition adduced to explain changes in internal consistency of 
scales. The saturation of an item or scale with its specific variance 
is likely to be increased by the remoyal of elevation and scatter. 
Where the factors contributing to elevation and scatter are not re- 
lated to the criterion, an irrelevant source of variance is removed, 
the saturation of individual items or scales with specific variance is 
inereased, and, if this specific variance is related to the criterion, an 
inerease in discrimination is accomplished. The results of the re- 
analysis of Mann’s data are consistent with this interpretation. If 
the specific variance were not related to the criterion the correction 
would be ineffective. On the other hand, if either elevation or scatter 
are related to the criterion, their extraction would result in an ap- 
parent decrease in discrimination regardless of whether the specific 
item or scale variance is related to the criterion. This appears to 
have been the case in the reanalysis of Zuckerman’s data. This de- 
crease in discrimination would be of little consequence because the 
elevation and scatter measures could be used by themselves to ac- 
complish the discrimination. Further analysis of factors contributing 
to yd and scatter would then be desirable also. 

A LEE and scatter may also affect validity by те 
КЕЧЫН Pise This would be the case if the wg 
ies M et d response sets, as mentioned earlier. 
validity could be ex Epid involved, some gain in apparent 

The illustrative ic sees removal of this source of error 
FARA ee ad aoe involved corrections for both eleva- 
Е oe has attempted 0 ФУ 
after other methodological E Eus, pepes bis + 

ata Я di udies, pointed to the possibility Е: 
ingly, it seems likely ey e adequately than elevation. Accor : 
ducing changes. A bleus. Loud qf = mos effective in PE 

e instrument with six or more response 


RICHARD Q. BELL 711 


categories would provide more adequate data to compare the effects 
of elevation and scatter. It seemed more desirable to await the col- 
lection of data from a questionnaire with a more suitable format 
than to rework the presently available data. 

One scoring procedure in frequent use involves counting the num- 
ber of “like” and “dislike” responses as a measure of scatter. Where 
only three response categories are involved it could well be assumed 
that computing a mean and standard deviation of response weights 
would offer little advantage. When a larger number of response 
categories are available, however, the mean and standard deviation 
offer a considerable improvement in preventing loss of information 
and actual misrepresentation of differences between subjects. For 
example, in the case of a questionnaire using four response cate- 
gories varying from “strongly like” to “strongly dislike,” a subject 
may have responded by selecting “strongly like” in one out of four 
Tesponses, the remainder being distributed equally between the other 
categories. He would receive a higher elevation score than a subject 
who selected the next category below “strongly like” for all of his 
choices, if elevation were measured by a simple count of the number 
of “strongly like” responses. Their positions would be reversed if the 
mean of response weights were used. A subject who has exclusively 
selected “strongly like” will be considered as having the same scatter 
Score as a subject who has selected an equal number of “strongly 
like” and “strongly dislike” when the number of extreme response 
category choices is used as a measure of scatter. The standard 
deviation of response weights as a measure of scatter would clearly 
reflect the latter subjects greater scatter. 

The question of computation burden involved in extracting eleva- 
tion and scatter has not been discussed at length since short-cuts, 
"PProximations, and computer scoring methods can be developed 
Teadily for any procedure which is useful. 


Summary 


A method has been described which permits isolation of ch nts 
and scatter in multidimensional questionnaires using single stimulus 
Items and fixed categories of response. The method is an application 
9f Cronbach and Gleser’s analysis of profile elements into elevation, 
Scatter, and shape. The procedure consists of obtaining 8 meqes 
9f the shape of the subject’s item response profile by standardizing 


712 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


responses using the subject’s own mean and standard deviation of 
item weights for response categories endorsed. The mean is a measure 
of elevation, standard deviation a measure of scatter. Both isolated 
measures may be useful themselves as information on the subject’s 
general response tendencies. Without further analysis of these re- 
sponse tendencies, however, it is not possible to specify contributing 
components such as a common content factor, response sets or style, 
or preference for socially desirable items. 

Since general response tendencies may lead to item intercorrela- 
tions within groups of items which would not otherwise show scale 
characteristics, the method provides a safeguard against spurious 
scales. General response tendencies may also interact with scale 
content so as to cause an underestimate of scale homogeneity. Ac- 
cordingly, an increase in item homogeneity can result from applica- 
tion of the procedure. In general, with elevation and scatter isolated 
content interpretations may be made more readily. Test-retest, re- 
liability estimates are also likely to reflect scale characteristics more 
specifically if the procedure is used. The general necessity for the 
procedure advocated is seen as deriving from a number of studies 
showing that general response tendencies play a greater role in re- 


sponse to the type of questionnaire mentioned than has been cus- 
tomarily recognized. 


REFERENCES 


Bass, B. M. “Authoritarianism or Acquiescence?” Journal of Ab- 
nee and Social Psychology, LI (1956), 616-623. 

g, tad Reponse Bias and Personality: the Deviation Hy- 
po heo шш of Psychology, XL, (1955), 61-72. 

gates 4 ex Campbell, D. T. “Response Set in the F Scale." 

i normal and Social Psychology, LIV (1957), 129- 


of the F Scale to a Response Set to An- 
can Psychologist, VIII. (1953), 335. 


Cronbach, L. J. 5 : 
Green Profiles” Pape Goldine C. “Assessing Similarity be- 


i a ological Bulletin, L (1953), 456-473. 
eee Ж ваа Study of Variability Scores.” EDUCA“ 
405. (a). ICAL MEASUREMENT, XVII (1957), 453- 


RICHARD Q. BELL 713 


Fiske, D. W. “The Constraints on Intra-individual Variability in 
Test Responses.” EDUCATIONAL AND PSYCHOLOGICAL MEASURE- 
MENT, XVII (1957), 317-337. (b). 

Jackson, D. N. and Messick, S. J. “A Note on ‘Ethnocentrism’ and 
Acquiescent Response Sets.” Journal of Abnormal and Social 
Psychology, LIV (1957), 132-134. 

Kelly, E. L. “Consistency of the Adult Personality.” American Psy- 
chologist, X (1955), 659-681. 

Mann, Vera D. “A Study of the Attitudes of Mothers of Cerebral 
Palsied Children Toward Child Adjustment.” Unpublished Ph.D. 
thesis, American University, 1957. 

Schaefer, E. S. and Bell, R. Q. “Development of a Parental Attitude 
pee Instrument." Child Development, XXIX (1958), 339- 
61. 

Webster, H. “Correcting Personality Scales for Response Sets or 
Suppression Effects." Psychological Bulletin, LV (1958), 62-64. 

Zuckerman, M., Barrett, Beatrice H. and Bragiel, R. "The Parental 
Attitudes of Parents of Child Guidance Cases." Child Develop- 
ment, XXXI (1960), 401-417. 


— 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


SOCIAL AND PERSONAL DESIRABILITY IN THE 
ASSESSMENT OF WORK VALUES 


DONALD E. SUPER ann JAMES G. MOWRY, JR. 
Teachers College, Columbia University 


Introduction 


THE purpose of the research described below was to determine the 
relative social desirability of 15 work values as measured by The 
Work Values Inventory (Hana, 1954; O'Hara & Tiedeman, 1959; 
Super & Overstreet, 1960). 

In the inventory, each value is represented by two descriptions, 
both intended to be equivalent expressions of that value. For ex- 
ample, Altruism is represented by the descriptions, “Work in which 
you can benefit others,” and “Work in which you can help other 
People.” Security is represented by the descriptions, “Work in which 
you have a feeling of security,” and “Work in which you are sure of 
4 job, even in hard times,” 

The Work Values Inventory is divided into two parts. Part One 
Consists of one set of 15 different value descriptions, each of which 
18 paired with each of the other 14 value descriptions. In Part Two, 
the second set of 15 descriptions is similarly paired. Thus, each 
Value appears 28 times. In all, there are 210 pairs of value state- 
ments, 

In order to determine the social desirability of the values, 14 
Braduate students at Teachers College, Columbia University, m 
23 clients of the New York YMCA Vocational Service Center ud 
А ated for each pair of value descriptions which one they considered 

Ore socially desirable.” Greater social desirability was defined as 
being "most, approved by others” and being “best for society for 
People to seek.” 


715 


716 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The students consisted of six men and eight women. The mean 
age of this group was 34.1 years. The clients were all men high 
school graduates under 30 years of age; their mean age was 216 
years. As a group, they had completed 13.3 years of schooling. 

Scores for each value were obtained by determining the number of 
times a subject rated it as “more socially desirable" and subtracting 
from this figure the number of times he considered the paired value 
more socially desirable. Since a value could be rated as more socially 
desirable 28 times, or the other value rated more socially desirable 
28 times, scores could range from —28 to +28, or, by adding 28 to 
each score, from zero to fifty-six. The per cent of the time that each 
group considered a value more socially desirable was computed by 
dividing the group mean score by 56, the highest possible mean score. 


Results 


i Table 1 below shows the per cent of the time the values were con- 
sidered ‘more socially desirable" by the graduate students and by 
the counseling clients. It can be seen that the two groups rated the 


TABLE 1 


vos Cent of the Times Each Value Was Considered 
ore Socially Desirable” by Counseling Clients 
and by Graduate Students of Education 


23 14 Graduate 
Clients Students Difference 
Value بے س‎ 
а : onec 69.6 62.6 +7.0 
ni А 
Way of Life a 52.1 +10.2 
Асры еы 61.6 48.1 413.5* 
Creativity 57.5 72.6 —15.1* 
Associates 57.4 53.8 4-3.6 
Security 52.6 59.9 -1.8 
Prestige 50.7 51.3 —0.6 
Management 48.7 54.3 -5.6 
Variety 46.5 32.5 414.0" 
Aesthetics 41.8 31.9 49 9 
Independence egi 51 3 —14 7 
Rer Rada и Ri 3i 
TAE 32.6 39.6 27,0 


* Per Cent of the time: i 
mean score, the group mean score for each value divided by the highest possible 


* Significant at the .05 level. 


SUPER AND MOWRY 717 


values somewhat differently. The t statistic was computed to de- 
termine the significance of the differences shown in the left-hand 
column. It was found that the counseling clients rated the Economic 
Return, Management, and Variety values significantly higher than 
did the graduate students, and Altrwism and Aesthetics significantly 
lower. 

In general, it would seem that the graduate education students 
tended to rate the values more nearly in accordance with the con- 
ventionally accepted social value hierarchy. Thus, one would expect 
Altruism to be considered relatively more socially desirable than 
Economic Return. That Intellectual Stimulation is placed third in 
social desirability, after Altruism and Way of Life, by the graduate 
students is perhaps a reflection of the fact that they are students of 
education. It is interesting that the counseling clients placed Intel- 
lectual Stimulation highest, although they did not choose it sig- 
nificantly more often than did the education students. 

Although the group differences found may reflect a true difference 
in the social value systems of the two groups and in the places of 
various values within a given hierarchy, it also seemed possible that 
there may have been some difficulty in distinguishing between social 
and personal desirability. If so, one would expect little difference 
between the ratings of groups asked to rate in terms of a) social de- 
Sirability and of b) personal desirability. ) 

Consequently, a second group of 26 male YMCA counseling 
clients was asked to complete the inventory on the basis of personal 
desirability. The two groups were quite similar with respect оа 
and education, Fifty-eight per cent of the personal desirability 
(PD) group had attended or were attending college, as compared 
With 48 per cent of the social desirability (SD) group. The mean 
umber of school years completed was 13.6 for the PD group and 
13.3 for the SD group. The mean age of the PD group was 22.3 
Years; for the SD group it was 21.6 years. 

The data from the PD group were tabulated in the ваше way ав 
those dbtained from the SD group. The per cent of the time each 
Value was considered “more personally desirable” was computed for 
the PD group and the ¢ statistic computed to determine the sig- 
nificance of the differences between the two groups. These data are 
shown in Table 2. | 

16 can be seen that only two of the values, Altruism and Inde- 


718 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 


Per Cent of Time* Each Value Was Considered “More Socially Desirable” 
by One Group of 23 Counseling Clients and “More Personally 
Desirable” by Another Group of 26 Clients 


Social Personal 
Desirability Desirability Difference 
Ratings(SD) Ratings (PD) (SD) — (PD) 
Value (%) (%) (%) 

Intellectual Stimulation 69.6 67.9 Hie 
Achievement 62.3 64.3 —2.0 
Way of Life 62.2 60.0 42.2 
Economic Returns 61.6 57.9 +3.7 
Altruism 57.5 44.7 +12.8* 
Creativity 57.4 51.1 +6.3 
Associates 52.6 46.5 +6.1 
Security 50.7 44.4 +6.3 
Prestige 48.6 54.8 —6.2 
Management 46.5 49.5 —3.0 
Variety 41.8 47.7 —5.9 
Aesthetics 36.7 37.5 —0.8 
Independence 35.1 46.7 —11.6** 
Supervisory Relations 35.1 38.3 —3.2 
Surroundings 32.6 37.4 —4.8 


ias Per Cent of the time: the group mean score for each value divided by the highest possible 


* Signifi 
icant at the .05 level, 
Significant at the .01 level, 


pendence, were rated in a significantly different fashion by groups 
asked to respond differently, These differences were in the expected 
prod Ms deno) desirability group rated Altruism higher and 
ependence lower than did the subjects who rated in terms of 
personal desirability, 
ird ot iaa a) Personal desirability groups appear to be 
ta Be Uis cups population, their value systems may be assumed 
Eo js may therefore be concluded that in responding (0 
may QR pube e Work Values Inventory, social desirability 
have no eff asm score, depress the Independence score, and 
0 eitect on others, Whether social desirability does in fact 


affect the: i i 
Se two scores 1n normal testing situations was of coufse not 
ascertained in this experiment, 


Summary 
ji ў : 
-: eio. determine the social desirability of 15 work value des- 
Pons, 28 male clients of a YMCA vocational guidance center 


SUPER AND MOWRY 719 


and 14 male and female graduate students at Teachers College, 
Columbia University, completed an inventory of work values in 
terms of social desirability. The students tended to rate the values 
more in accordance with the conventionally accepted value hier- 
archy. In order to obtain an indication of the extent to which sub- 
jects distinguish between social and personal desirability, a second 
group of 26 male counseling clients was asked to complete the inven- 
tory in terms of personal desirability. A comparison of the two 
counseling groups revealed that the group rating in terms of social 
desirability rated Altruism significantly higher and Independence 
significantly lower than did the group which rated in terms of per- 
sonal desirability, There were no significant differences between the 
group ratings of the other 13 values. It was concluded that, on the 
Work Values Inventory, Altruism scores may in some testing situa- 
tions be somewhat elevated and Independence scores somewhat low- 
ered, but that other scores are free from the effects of social de- 
sirability differences. 


REFERENCES 


Hana, A. M. “Work Values in Relation to Age, Intelligence, Socio- 
economic Level, and Occupational Interest, Level." Unpublished 
Ed.D. thesis, Teachers College, Columbia University, 1954. 

O'Hara, R. P. and Tiedeman David V. “The Vocational Self Con- 
cept in Adolescence.” Journal of Counseling Psychology, VI 
(1959) , 292-301. , 7 

Super, D. E. and Overstreet, Phoebe L. T'he Vocational Maturity of 
Ninth Grade Boys. New York: Teachers College Bureau of Pub- 
lications, 1960. 


س ————— — —— ——————————— ———— 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vot. XXII, No. 4, 1962 


A NOTE ON THE ADJUSTMENT OF FOURFOLD TABLES 
FOR “CURVILINEARITY” 


NORMAN CLIFF 
Educational Testing Service 


ТнЕ investigator who wants an index of the degree of association 
between dichotomous variables, which he can interpret and manipu- 
late in the same ways he would a Pearson product-moment correla- 
tion, is confronted with a large array of coefficients from which to 
choose. He must weigh the undesirable characteristics of each 
against its advantages and make his choice on the basis of his pur- 
Poses and the characteristics of his data. Unfortunately, the ad- 
vantages and disadvantages are not always set forth in the sources 
in which he finds the computing formulas, even though the source 
may be a generally competent and reliable one. 

Guilford (1954, pp. 530-531) reports a suggestion by J. W. Holley 
Concerning the correction of fourfold tables for “curvilinearity.” The 
Suggestion is that in a fourfold table such as that below, the a and 


Item 1 


d cells be averaged and the average substituted for both. The b and 
© cells are similarly averaged, so that the marginal frequencies are 
thus adjusted to .50 for both items. A tetrachoric correlation is then 
Computed from the adjusted cell frequencies. Guilford goes on to 
Point out that the resulting coefficient is the same as the “unlike 


721 


722 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


signs" correlation if the appropriate substitutions are made in the 
well-known approximation formula 


EN Vic (I) 
Уе + Vad 


In considering the appropriateness of the adj ustment, let us first 
examine the notion of curvilinearity, applied to a fourfold table, by 
reviewing the identity of the tetrachoric correlation. If one takes a ' 
bivariate-normal distribution in which the two variables have a 
certain correlation and bifurcates the two marginal distributions at 
points corresponding to p and p’, certain proportions of the bivariate 
distribution will fall in each of the four cells of the resulting four- 
fold table. These proportions can be used to compute a tetrachoric 
correlation which is, in theory, the same as the correlation between 
the continuous variables except, for sampling fluctuations and the 
moderate inaccuracies introduced by the particular approximation 
formula used. This is true regardless of which points are selected for 
p and P’. Despite the correlation's independence of the marginals, 
Guilford remarks on a “disproportionately” small cell frequency 
Which leads to a “spuriously” high correlation. This would lead one 
to believe that ты should not be used if p and p’ differ appreciably 
since this will always result in one cell appearing to be very small, 


especially if there is a positive correlation, Such a belief has, of 
course, no foundation. 


Furthermore 


ftot = COS т 


the computed correl 
we have Seen, this is 


sembl 1 
es a tetrachoric very closely; the marginal frequencies have 8 


ficient. When the proportions in 


Ties = cos z(b + o). (2) 


NORMAN CLIFF 723 


The proportion (b + c) has a minimum of | р — р' | and a maxi- 
mum of (p + p’) or (2 — p — p’), whichever is smaller. (The cor- 
relation has a maximum where the angle has a minimum, and vice 
versa.) 

These limits mean that, if there are 90 per cent positive responses 
on each of two items, the computed correlation between the two 
must be at least .81, regardless of the degree and direction of the 


‚ true relationship. Conversely, if the proportion of positive responses 


to the two items are .90 and .10, respectively, then the computed 
correlation is at most —.81. 

Exact expressions for the effect of the adjustment are cumbersome 
to evaluate because they involve three variables, but, generally 
speaking, the correlation seems to increase as the average of p and 
?' approaches unity or zero and to decrease as they differ from each 
other. Furthermore, the effects may be quite enormous relative to 
typical interitem correlations of .05 to .25. 

Table 1 further illustrates the effect of the adjustment. It contains 
the correlation between items from two sets as computed from 
formula (2) when the correlation from formula (1) is .50 in all 
cases. The row and column heads give the respective item difficulties. 
Thus the first entry in the first row (.89) is the formula (2) cor- 
relation between two items of .90 difficulty when the formula (1) 
correlation is .50. Formula (1) is an approximation to the exact гы 
which is generally accurate to small errors in the second decimal 
Place (Davidoff, 1954). 

Clearly, the effect of the adjustment can be extremely marked; in 


TABLE 1 


Correlations Computed from Adjusted Fourfold Tables* 
————— E 
Set A Item Difficulties p 


.90 .70 .50 .80 .10 


Set B Item 
Difficulties p' 
5 


* Correlations in unadjusted tables are .50 in all cases. 


724 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


one ease the .50 changes to —.82 and even the correlation between 
items with the moderately different difficulties of .30 and .70 drops 
to .07. Thus, under common circumstances, the magnitude of the 
correlation computed by the adjustment—and the equivalent unlike 
signs coefficient—depends very largely on the item difficulties or 
popularities rather than on the degree of association. 

It is hoped that the considerations presented here will deter in- 
vestigators from treating "correlations" derived by the method dis- 
cussed as bearing any close resemblance in meaning to either the 
product-moment or tetrachorie correlations. 


REFERENCES 


Davidoff, M. D. “Note on 'A Table for the Rapid Determination of 
the Tetrachorie Correlation Coefficient,’ " Psychometrika, XIX 
(1954), 163—164, 


Guilford, J. P. Psychometric M. thod. iti k: 
McGraw-Hill 1954 c Methods (Second Edition). New Yor 


| 


[ 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


FACTOR ANALYSIS AS A CONTROLLING TECHNIQUE: 


HARVE E. RAWSON 


AND 
SALOMON RETTIG 
Department of Psychiatry, The Ohio State University 


Factor analysis has often been utilized as a technique which re- 
duces and organizes massive data into meaningful dimensions. 
These dimensions are usually viewed as descriptive, i.e., describing 
the underlying reference vectors of à construct, scale, or behavioral 
phenomenon under study. The use of factor analysis for the purpose 
of controlling extraneous influences such as response acquiescence 
and socioeconomic status on research findings has rarely been men- 
tioned. If such influences are known prior to the research, but cannot 
be controlled by means of the experimental design, some of their 
effects can be measured by analysis of variance or by partial or 
Semi-partial correlational techniques (Rawson & Rettig, 1962). 
However, in studies where these contaminating influences are not 
Specifically known, but presumed to be pervasive, factor analysis 
often proves to be very efficient in isolating and identifying their 
effects. The purpose of this note is to illustrate the use of factor 
analysis as a statistical technique for the control and isolation of 
these extraneous and/or “contaminating factors” in two very differ- 
ent studies, 4 

In a recent sociopsychological experiment (Rawson, Rettig & 
Pasamanick, 1961) which attempted to study the relationship of 
ethical judgments to unethical behavior, it was hypothesized that: 
(1) certain predictive variables (ethical judgments) would load on 

® same dimension as a specified criterion of unethical behavior 


ya his work was supported in part by a research grant M-5042, from the 
ational Institute of Mental Health. 


725 


726 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(exploitative deception), forming a unidimensional pattern of cri- 
terion-predictor loadings; and that (2) various contaminating ef- 
fects inherent in the experimental setting? would form separate 
orthogonal dimensions. Thus, it was conjectured that the orthogon- 
ality would isolate the unidimensional criterion-predictor relationship 
for further study. This isolated factor would be free from con- 
taminating influences. Furthermore, the dimensionality of the con- 
taminating variables could be separately determined. This would 
yield added information concerning: (1) the extraneous influences 
due to the experimental setting; (2) the effects of these contami- 
nating influences on the predictor-criterion relationships (by com- 
paring the original correlation matrix to a reproduced matrix of the 
predictor-criteria dimension); and (3) the co-variations among 
these contaminating variables—a valuable aid in future research 
where effective control measures are sought. 

The first three orthogonal factors extracted are shown in Table 


TABLE 1 
Unethical Behavior Study 
Unrotated and Rotated Factor Loadings (N = 74) 
Unrotated Rotated* 
CRITERION ENG i uz ш 
1. Exploitative deception 8 28 41 з 0 00 
аот 
oral Value Factor Scales 

2. бешки) 52 —94 —17 be -u —27 
8.  Religio-family (B) BSG —25. as 27 -À 
Е зат endeavor (C) 12 -32 —07 12 —22 —2 
6 mM DIT 64 -29 —30 64 -07 =4 
ES D | ve-manipulative (E) —24 2 651 -24 -05 68 

. Social security (F) 05 36 —10 05 24 2 
CONTROLS 

8. Social desirability 30 33 -21 30 39 0 
m Year in school 7 е EN D = -N o 
ij a 39 п тт з и 2 

. Socialization residence 17 30 «¢ 17 0 4 
13. Religious preference 04 : : A 0 4 
14. Church attendance iH d i -19 


35 17 —34 35 32 
* Factors II and III TOtated 34* 


structure. Clockwise; rotation of Factor I did not improve simple 


2 This experiment measured d 
reinforcement for such behavior, 
scribed schedule was also present 


eception under conditions of high monetary 
A stooge who "cheated" according to а pre- 
to facilitate such deceptive behavior. 


RAWSON AND RETTIG 727 


1. Factor I has high loadings on "general" and “puritanical” judg- 
ments. These judgments primarily tend to reflect an acquiescence to 
socially desirable and idealized moral norms. Additional lower 
loadings appear on religio-family judgments, social desirability, 
curriculum choice, church attendance, and the criterion of exploita- 
tive deception. This factor could probably be best described as a 
contaminating dimension of generalized acquiescence. Factor II iso- 
lates the age-related variables into a separate dimension. This 
dimension is unrelated to the hypothesis under investigation. Factor 
III loads high on the exploitative judgments and the exploitative 
criterion. Additional secondary loadings appear on socialization resi- 
dence, religious preference, and puritanical judgments. This factor 
is the predicted criterion dimension which demonstrates the major 
hypothesis under investigation: that exploitative-manipulative 
moral judgments are related to exploitative behavior. The relation 
of the criterion-related variables can best be estimated by the 
product of their loadings. For example, the reproduced correlation 
between the hypothesized predictor and the criterion is 29 (.58 X 
50). In addition, socialization residence, religious preference, and 
anti-puritanical judgments (negative loading) also relate to the 
behavioral criterion and can be considered clouding conditions as- 
sociated with the hypothesized predictor-criterion relationship. This 
added information may be of importance for further research. 

A second neuropsychiatric study (Rettig, Pasamanick, Knobloch 
& Rawson, 1961) attempted to demonstrate the effects of certain 
neonatal conditions of the infant and pregnancy complications of 
the mother on the neurologic status of the infant at 40 weeks. It was 
suspected that the demonstration of this effect could only be shown 
if the socioeconomic status of the mother could be controlled, since 
previous work (Pasamanick, Knobloch & Lilienfeld, 1956) had indi- 
cated that socioeconomic status may have a critical relation to all 
of the above indices. Since information on the socioeconomic status 
of the mother was available, it was included in a factor analytic 
design, 

Table 2 shows the first two orthogonal factors extracted. г 
Totation, Factor I is clearly the hypothesized criterion-predictor di- 
Mension. This factor has high loadings on the neurologic and 
developmental criteria, the neonatal conditions, and the pregnancy 
Complications. Factor II is clearly a dimension delineating the socio- 


728 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 


Neuropsychiatric Prediction Study 
Unrolated and Rotated Factor Loadings (ЇЇ = 841) 


Unrotated Rotated* 
CRITERIA: I II I II 
Neurologic 

1. Over-all neurologic status 50 62 80  -0 
2. Difficulty in retaining 35 49 60 —05 
3.  Maldirected reaching 41 40 67 06 

4.  Fisted hands 35 45 57 —02 

5. Sitting on narrow base 34 30 45 07 

6. Fine motor quotient 32 30 44 05 

т.  Spasticity and hypertonicity 31 24 38 08 

8. Gross motor quotient 26 21 38 06 

Developmental (at 40 weeks): 

9. Height —51 00 —88 —89 
10. Weight -52 —05 =87  -8t 
PREDICTORS: 

Neonatal Conditions: 
11. Birth weight 14 22 -ég -0 
12. Oxygen given 24 hours or more 31 42 2, 28 
13. Respiratory distress 19 29 34 —04 
14. anosis 11 29 49 =0 
15. Convulsions Ир 41 в -ù 
Pregnancy Complications: 
16. Premature separation 13 27 29 —07 
l7. Bleeding 19 19 27 02 
Type of Delivery: 
18. Spontaneous vertex 58 —47 01 75 
19. Forceps —58 34 ыў —66 
E Number of Previous Pregnancies: 
2. Total 52 —45 —01 68 
mol 44 —46 -07 63 
; mature 33 —05 17 28 
CONTROLS 
23. Race 0 
24. Census tract | aa um Б. 
25. Education of mother 34 30 01 —45 
26. Type of medical care —53 4 6 01 —70 
27. Number of prenatal visits —42 19 -12 -# 
: Menon 


* Rotated 40° counterclockwise, 


из. status of the mother. The orthogonality of the two factors 
leaves the criterion dimension free of all extraneous socioeconomic 
influences. (As can also be observed, height and weight of the 40- 
week infant Appear to be a function of both conditions at birth and 
socioeconomic status since they load on both factors.) 

In both studies, the factor analytic design proved to be an efficient 


RAWSON AND RETTIG 729 


technique for control purposes. In the sociopsychological study, an 
othogonal dimension loading on the criterion and on the crucial 
predictive variables was extracted. After rotation, this criterion- 
predictor dimension appeared to be relatively free from extraneous 
and contaminating influences such as generalized acquiescence and 
age. In the neuropsychiatric study, the first two factors extracted 
were a predicted criteria dimension and a socioeconomic control di- 
mension. In both studies, variables unrelated to the major hy- 
potheses were isolated to form separate dimensions. The variance 
due to these contaminating effects had been partialed out from the 
hypothesized eriteria-predietor dimension. Furthermore, additional 
information was obtained concerning the clustering of such ex- 
traneous influences. 


REFERENCES 


Pasamanick, B., Knobloch, H., and Lilienfeld, A. M. “Socioeconomic 
Status and some Precursors of Neuropsychiatric Disorder." 
American Journal of Orthopsychiatry, XXVI (1956), 594-601. 

Rawson, H. E. and Rettig, S. “Controlling the Effects of ‘Clouding 
Variables’ in Multivariate Research Designs.” EDUCATIONAL AND 
PSYCHOLOGICAL MEASUREMENT, XXII (1962), 493—500. 5 

Rawson, Н. E., Rettig, S., and Pasamanick, B. “The Relationship of 
Exploitative-Manipulative Value Judgments to Exploitative Be- 
havior under Conditions of High and Low Ethical Risk." Re- 
search Division, Department of Psychiatry, The Ohio State Uni- 
versity, 1961 (mimeograph). 

Rettig, S., Pana Bo RENE H., and Rawson, H. E. “A 
Multi-Dimensional Computer Analysis of some Complications 
of Pregnancy and their Neuro-Psychiatric Sequelae: An Ex- 
ploratory Investigation.” Research Division, Department of 
Psychiatry, The Ohio State University, 1961 (mimeograph). 


А EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


- 


Vor. XXII, No. 4, 1962 


A METHOD FOR CORRECTING ITEM-TOTAL 
CORRELATIONS FOR THE EFFECT OF 
RELEVANT ITEM INCLUSION 


KENNETH I. HOWARD ax» GARLIE A. FOREHAND? 
University of Chicago 


Охе method for item-analyzing a scale involves the assessment 
of item-total correlations (see Guilford, 1954, pp. 417-443). Using 
this method one simply computes the correlation of each item with 
the sum of the items (total score) and eliminates those items that 
seem to be more poorly correlated with the total. The most rigorous 
way of pursuing this procedure would involve correlating each item 
with the sum of the remaining items (total minus the relevant item) 
in order to avoid the spurious correlation inflation caused by the 
relevant item’s inclusion in the total. This latter approach, how- 
ever, is quite time consuming and the usual practice is to assume 
that each item-total correlation is equally inflated. 

The above assumption is probably adequate for deciding between 
items in the measurement of one scale, but is not applicable to the 
Problem of discriminant item analyses. 

A valuable extension to the convergent and discriminant approach 
to construct validity of Campbell and Fiske (1959) can be made in 
the area of test construction. In their now classic article, they stress 
that trait measurement must be validated by low correlations with 
tests from which they were intended to differ, as well as high cor- 
relations with tests with which they were intended to agree. Camp- 
bell and Fiske proposed the evaluation of this discrimination and 
Convergence by the multitrait-multimethod matrix which involves 
the measurement of more than one trait by each of at least two 


* Now at Carnegie Institute of Technology. 
731 


732 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


methods. In test construction we are generally limited by one 
method (the test), but we are not limited to one trait or variable, 
If one were to construct a test that measured two or more traits, 
then one could evaluate the correlations of items with the total score 
of the scale in which they are included and with the total scores of 
the other scales. To maximize the convergent and discriminant 
validity of the scales, one could choose items that yield scale scores 
whose intercorrelations are near Zero, or do not exceed a specified 
value in the standardizational sample. (For further discussion of 
this procedure, see Loevinger, Gleser, and DuBois [1953], especially 
their discussion of scale saturation.) For the case where the test is 
going to be correlated with other data and the resulting matrix fac- 
tor-analyzed, one can thus minimize the chance of a method factors 
emergence. In order to select items by these criteria, one must com- 
pare an item’s correlation with its own scale with its correlation with 
other scales, The spurious inflation of correlation resulting from in- 
clusion of the relevant, item makes this comparison difficult, and it 


MR reting the results of cluster analysis. For ex- 
ample, linkage analysis (MeQuitty, 1957) provides a method de- 


s minimized. The relation of à 
ter relevancy) is generally ex- 
e variable and the total cluster 
: in, the inclusion of the relevant 


poo ———" -— —— "Á—— Г 


БА. аш а 


HOWARD AND FOREHAND 733 


where the subscript ? refers to the relevant item, and é to the total 
score (including the relevant item). 

An alternative formulation—equivalent to the above, but derived 
independently from the formula for the correlation of sums—re- 
veals some interesting properties of the correction for relevant item 
inclusion. This alternative formula suggests a convenient ap- 
proximation to the full equation, and permits examination of the 
conditions under which the approximation is satisfactory. 

The correlation given in equation (1) above may be expressed 
as the correlation between item 7 and the sum of remaining items: 


^ ^ 
2 9:037; Le ili 
ЕУ ВЫНА Ра а. 


Tia- = = Q) 
0,0 (1—i) 9€ (1-4) 


Similarly, the correlation between item û and the total score (item 
i included) is: 

Oi; + с; 
> 1 { (3) 


с; 
If the left-hand term of equation (3) is added to (2), and the right- 
hand term subtracted, and terms are collected, then 
«(rcr (4) 
i 


i(t—é) ae 9, gi F (4-4) =i 


Т, 5 


Applying equation (3) to the summed term in (4), and writing 
q- as the standard deviation of a difference, 


fij mr ES [(2 E = ук, Hn 2] (5) 
о, Ti Mad F o — runi 

Examination of equation (5) reveals that the third term (in 
brackets) will approximate zero (a) when о is small relative م‎ "n 
and (b) when ry is small and positive. In a case where item vari- 
ances are small relative to total variance and item-intercorrelations 
are low (a not unusual case in item analysis), a good approximation 
to the above formula is given by: 


Ti (6) 


Titri) ex um с. 


The approximation is a lower bound for Tut) when oyor<tit, OF 
When оо, > 2л, Under one of these conditions, a test constructor can 


734 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


determine whether the approximate value meets his criterion for in- 
cluding an item in the scale. If so, then the exact value must meet 
the criterion; if not it may be necessary to compute fiq) exactly, 
by equation (1) or (2). 

As an example of the goodness of the approximation given in ex- 
pression (6), the approximate value was compared with the com- 
puted r+) in 139 cases, involving 12 scales ranging in size from 4 
to 19 items.’ Absolute discrepancies between actual and approximate 
values ranged from .000 to 183 (in the case of the 4-item scale) 
with a median of .009. Eighty-one per cent of the discrepancies were 
below .025; only six discrepancies exceeded .055. 

As experience in using the formula accumulates, it may be poss- 
ible to specify relative values of oj, o; and rz for which the approxi- 
mation will be adequate, and those for which the full formula should 
be used. The most efficient procedure for using the formulas at 
Present would appear to be as follows: For a given scale (and thus 
^ given ол), use equation (5) to compute rı) for (a) the item with 
the highest os, and (b) the item with the highest r. The two condi- 
tions will often coincide; when they do, the bracketed term in the 
expression will show the maximum discrepancy between ri) and 
the approximated value, for that particular scale. When the two 
conditions do not coincide the maximum discrepancy may still be 
obtained o dd quickly by applying formula (5) to additional 
Ти, until an item thus examined 


В F epancy thus observed will be the 
ЖАХШ, discrepancy for the entire scale. If the discrepancy thus 
observed is tolerable to the investigator, then the approximation 
may be safely used, Otherwise, the full expression must be com- 


ДҮК It is always Necessary to use the full expression when Tit 18 
negative. 


REFERENCES 


Eden Е ke, D. W, “Conv t and Discriminant 
даа by the Multitrait-Multimethod Matrix G Psychologi- 

Guilford J. Р ip (1959), 81-105. | 
nh Шш ie Correlation of an Item with a Composite of the 


In a Test,” E З Oat 
EASUREMENT, XIII (1953), ale AND PSYCHOLOG 


HOWARD AND FOREHAND 735 


Guilford, J. P. Psychometric Methods, New York: McGraw-Hill, 


1954. 

Loevinger, Jane, Gleser, Goldine C., and DuBois, P. H. “Maximiz- 
ing the Discriminating Power of a Multiple-Score Test.” Psycho- 
metrika, XVIII (1953), 309-317. 

McQuitty, L. L. “Elementary Linkage Analysis for Isolating Orthog- 
onal and Oblique Types and Typal Relevancies,” EDUCATIONAL 
AND PsYCHOLOGICAL Measurement, XVII (1957), 207-229. 

Zubin, J. “The Method of Internal Consistency for Selecting I 
Journal of Educational Psychology, XXV (1934), 345-356. 


"DN 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
VoL. XXII, No. 4, 1962 


ACQUIESCENCE AND SUGGESTIBILITY IN CHILDREN' 


H. B. GIBSON 


Institute of Psychiatry, Maudsley Hospital 
University of London 


Tux study of acquiescent set in questionnaires was stimulated by 
Cronbach’s article of 1950. Cronbach showed that response sets of 
various kinds, including that of the tendency to agree rather than 
disagree with items in questionnaires, operated in many conven- 
tional tests. Much subsequent work has been done with the Cali- 
fornian F scale, as this scale lends itself to the operation of an ac- 
quiescent set by reason of its arrangement. АП the items are worded 
in a single direction and thus an “authoritarian” seore results from 
Persistent agreement, Many studies investigating this scale have 
shown that the F score is positively correlated with an independent 
component of acquiescence; a study by Bass (1955) indicated that 
as much as 75 per cent of the variance can be accounted for in terms 
of acquiescence. A defense of the validity of the F scale was made 
by Gage, et al. (1957) who maintained that the psychological mean- 
ing of acquiescence resembles that of authoritarianism. This opinion 
has not been confirmed by studies of authoritarian content and ac- 
Quiescent set by Messick and Frederiksen (1958), Couch and Кеп” 
iston (1960), and Prentice (1958), but some slight support comes 
from Eysenck’s (1962) study with neurotics. 

There is now ample evidence of the fact that ac È 
fact generalize from one questionnaire to another, but what и 
clear is what sort of aspect of a subject’s personality 5 associated 
With his tendency to endorse rather than deny items in question- 
ees 


"This research was made possible through a grant from the Nuffield Founda- 
i + "n i b; 
tion. The author would like to express his appreciation for the advice given y 

- W. D. Furneaux in the preparation of this paper. 


737 


quiescence does in 


738 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


naires irrespective of their content. This problem is obviously of , 
great importance to those who are concerned with testing person- 
ality by means of questionnaires. Eysenck’s (1962) study has shown Я 
that, with hospitalized neurotics acquiescence has no significant 
effect upon their self-ratings on personality criteria, but it may 
be that normal subjects, and children in particular, have arf attitude 
of comparative indifference to personality ratings such that a re- 
sponse set may contaminate the results of those particularly high or 
low in acquiescence, 

Earlier studies of personality in relation to response set have been — 
concerned with the extreme response either way, and have not in- 
vestigated, the personality differences between those who are high 
and those who are low on acquiescence. Thus Berg and Collier 
(1953) found that subjects high on anxiety tended to have extreme 
sets on the PRT (Perceptual Reaction Test), and, furthermore, that, 
those high on the Body Sway test were extreme scorers. They do not 
report, however, whether the high swayers were predominantly ac- , 
quiescent. Lewis and Taylor (1955) confirmed these findings with 
Tegard to anxiety and extreme response set, and Berg (1955) used 
these and similar findings to formulate his “deviation hypothesis” Ч 
concerning the diagnostic value of response set. 

Later studies have turned attention on acquiescence as а per- 
sonality variable. Guba and Getzels (1954) identified questionnaire 
acquiescence with “suggestibility,” without clearly defining what 
they meant by suggestibility. Barnes (1955), using the PRT with 

“normal and psychiatric subjects, found that, abnormal groups were 
higher on acquiescence, except for those with character disorders 
who were at the other extreme. Jackson, et al. (1957) found that 
acquiescence correlated positively with “rigidity,” as shown on 
Einstellung water jar problems, a result which may be indirectly 
confirmed by the above-mentioned study by Messick and Frederik- 


a ue › Which implies that an acquiescent set may be consistent 
with the construct of “intolerance of ambiguity.” The latter authors 
also found a negative { 


RE Correlation between acquiescence and Verbal 
nowledge, General Reasoning and Deduction. Later work by 
тейеш and Messick (1959) shows acquiescence to be associ- 
ated with ^ generally uncritical attitude. The most recent important 
study in this field is that of Couch and Keniston (1960) who factor- 
analyzed the intercorrelations between 4 variety of personality 


H. B. GIBSON 739 


measures and a measure of acquiescence. They then set up hypothe- 
ses concerning the personality characteristics of subjects extremely 
_ high and extremely low on acquiescence, and tested them by means 
of a structured clinical interview technique. Their results, which 
they expressed in Freudian terminology, indicate that those subjects 
who were particularly high on acquiescence were extraverted, and 
vice versa. 

The present study is concerned with children and was done as part 
of.a general investigation into suggestibility. In view of the studies 
reviewed above, it was expected that the phenomenon of acquiescent 
set would be observed in the questionnaire responses of children. 
An hypothesis was suggested that acquiescence would correlate posi- 
tively with primary suggestibility, measured by the Body Sway 
test, as defined by Eysenck (1943). 


Method 


The subjects were 99 boys and 119 girls; all were normal children 
between the ages of 8 10/12 and 14 years (Mean Age—11 3/12 years, 
| $D.—18 months). The means and S.D.'s of the two sexes were ES 

significantly different. All subjects completed the Junior Maudsley 
Personality Inventory (JMPI), a 44-item inventory a 
Furneaux and Gibson (1961a), which gives measures ms 
sion and Neuroticism. i 

After completing this inventory, 45 of the boys and 31 of bolo 
Were given a battery of suggestibility tests, including the Body 
Sway test. Selection was made on a random basis where possible re 
E factors such as obtaining parental consent and the availability of 

the children also determined which of the subjects were given the 
Suggestibility tests. 

The Body Sway test was administered in the following manner. 
The subject was blindfolded and placed with his feet together in 
front of a kymographie sway recorder, to which he was attached by 
4 thread from his collar. For a period of 30 seconds pl cde 
Spoke to him reassuringly while he settled into à comfortable and 

. relaxed posture, and then a taped record of verbal suggestions 01 
forward sway was played for 90 seconds. The movements of the 
Subject were thus recorded for the whole of this period, or, in some 
Cases, until he actually fell forward and was caught by the experi- 
menter. 


740 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Results 


It has been shown by Helmstadter (1957) that for inventories 
where the response to each item is either “Yes” or “No” (or, as in 
the case of the JMPI, “Same” or “Different”) a set score may be 
calculated which is independent of the content score. It is assumed 
that two main factors affect the subject’s response to every item; a 
consideration of the actual content of the item and a general tend- 
ency to respond to items in terms of a positive, neutral, or negative 
set. The positive set is referred to in the present study as Acquies- 
cence, and Helmstadter means positive set in referring to the Set 
score. 

In Helmstadter's “Postulated Knowledge Procedure,” he reasons 
that the number of items marked in agreement with the key can be 
thought of as resulting from a summation of those items marked in 
this same direction on the basis of content and of those items 


marked in this same direction on the basis of set. He derives an 
expression for the Set score, viz.: 


EUN, __, 
М + Му, 
where №, and Na indicate the number of items keyed positively and 


and Tespectively, and W, and Wa indicate the number of 
Items marked in opposition to negative and positive keying respec- 


x in SUM еш give numerical values for Set scores from 


xs pou and standard deviation of the Acquiescence (Set) 

PA e of boys were found to be very similar to those of 
group. Correlatin Ман : 3 

found that for both gi Acquiescence scores with age, it was 


Sees Acquiescence declined with age. These 
relationships are summarized in Table 1 e g 


ES Plot of these data suggested that for the boys the 
n of Aequiescence on age was possibly of a nonlinear nature, 


Acq/Age Significance 
r evel 


H. B. GIBSON 74 


a marked drop in Acquiescence occurring soon after the eleventh 
year. The linearity of the regression was tested by dividing the 
plot into 10 arrays and carrying out an analysis of variance. The 
variance due to deviation of array means from linear regression was 
found to be insignificant (F = .93 with 8:89 df, p > .05). For pur- 
poses of future calculations it was therefore assumed that the re- 
gression of Acquiescence on age was of a linear nature, although this 
assumption may not be strictly correct for the boys. 

The regression of Acquiescence on age is greater for girls than for 
boys. This is shown by the difference in the correlations between the 
two variables for the two sexes, as given in Table 1. In order to test 
whether this difference in regression was significant, an analysis of 
covariance was carried out to establish the ratio of the variance 
between the regression lines to the residual variance about the re- 
gression lines. This ratio was found to be quite insignificant (F = 
ЗІ with 1:214 df, p > .05). It may therefore be assumed that the 
sex difference in this respect is a negligible factor. Before comparing 
the Aequiescence score with the Body Sway scores, an adjustment 
was made for the observed regression of Acquiescence on age. 


Body Sway 


The response to the Body Sway test can be divided into three 
categories. 

A. The subject sways forward but maintains his balance. 

B. The subject sways forward and loses his balance thereby fall- 
ing forward. 

C. The subject sways backward. 
In this study only one subject made no measurable movement dur- 
ing the test, and she has been allocated to Category A. The results 
of the Body Sway test are presented in Table 2. ~ 

Previous work in connection with the Body Sway test in relation 


TABLE 2 
Categories Obtained from the Body Sway Test 
= 
A B (0) 
N Forward Swayers Falls Backward Swayers 
(excluding falls) 
Boys 45 25 12 8 
Girls 31 14* 12 5 


* Including one subject whose sway was virtually zero. 


142 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


to suggestibility has established that subjects who fall on this 
tend to be highly suggestible both in terms of susceptibility to hyp- 
nosis and high performance on other tests of primary suggestibility. | 
Discrimination, with regard to relative suggestibility, among those _ 
who sway forward but maintain their balance may be effected by 
measuring the distance they have swayed forward from the vertical. 
It has been found with adults that simply by measuring the maxi- 
mum forward sway achieved at any point in the duration of the 
test, a score may be derived which correlates highly with the sub- - 
ject’s susceptibility to hypnosis (Furneaux, 1946; Furneaux & Gib- 
son, 1961b). Forward swayers in the present study were therefore 
seored on this basis. The implications of backward sway are some: ` 
what uncertain; backward swayers among adults have certainly 
tended to be low on other measures of primary suggestibility and 
hypnosis, but it is not clear whether they are less suggestible than 
those who sway forward only a very little. For purposes of compari- 
son of Body Sway with other variables in this study, it was there- 
fore decided to treat the backward swayers as a separate group. i 
It is evident from Table 2 that there is no significant difference” 
between the backward swayers in relation to sex. It is also evident 
that proportionally more girls have fallen. By allocating scores as t0 
the extent of sway to the forward swayers as indicated above, and 
classifying those who fall with the high scorers, the population may 
be dichotomized taking 4.75 inches as the median, thus: К 


High Sway Low Sway 
(forward only) 
Boys 14 23 
Girls 16 10 


Thus the girls tended to 
In order to test for the ] 
Body 8 


be higher on Body Sway than the boys 


mE € sexes was calculated for each array. The — 
8 oo square Was 7.35 with 4 df, which is a little below the 08 


level of significance, The diff ү а 2 
3 erence w : single 
array of Body Sway scores, е was not significant in any gle 


The regression of Bod: 


у Sway on igible for. 
both boys and girls. Y on age was found to be negligi 5 


In considering the question of correlating the Acquiescence scores 


H. B. GIBSON ; из 


(adjusted for regression оп age) with measures of Body Sway, а 
decision has to be made as to the most appropriate correlational 
coefficient to use. The product moment r would necessitate the allo- 
cation of numerical scores to those who fall, and, while it is justifi- 
able to consider the latter as being among the more highly suggesti- 
ble subjects, the allocation of a numerical score to them is more 
debatable. Again the allocation of a numerical score to the back- 
ward swayers also raises debatable questions. While backward 
Swayers may reasonably be classified among those low on suggesti- 
bility, it is not clear whether, for instance, a subject who sways 
backward 1 inch is more or less suggestible than a subject who 
sways forward .5 inch. For purposes of comparison it was decided to 
consider the category of backward swayers as being the least sug- 
gestible of all, such an hypothesis being only tentative and modi- 
fiable by analysis of the present data. 

With these questions in mind, it was decided to calculate tetra- 
chorie r’s separately within different portions of the whole array of 
Body Sway scores. The correlations obtained are presented in Ta- 
ble 3. 

Discussion 

The results of this investigation into Acquiescence in children’s 
Tesponse to a questionnaire are of considerable practical importance. 
Following Helmstadter’s reasoning, it may be seen that as the com- 
ponent due to Acquiescence increases so the component due to Con- 
tent decreases; thus at the theoretical extremes where the Set score 
Would be +1 or —1 the Content score would be zero, i.e. there would 
be no validity in the personality measures derived from the inven- 
tories of subjects having an extreme response set. 


TABLE 3 


T'etrachoric Coefficients Obtained by Correlating Acquiescence Scores 
(Adjusted for Age) with Body Sway Scores 


1 iB A+C A+ jo +C 
Forward swayers Forward swayers Forward an j 
B excluding falls plus falls backward swayers SP (4б) 
oys .43 (25) .12* (37) t 05 (31) 
Girls —.22 (14) 25 (26) LM 


* 


igni! men 
Note; Significant at the .01 level; all other correlations are below the .06 level of significance. 


jects 

he categories are those referred to in Table 2, and the number of subj j 

each is stated in parentheses. In each column are entered the values of fies Ol within 
© Specified range of sway scores. 


744 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Tn fact, the S.D. of the Acquiescence scores for this population of 
children was only .19. The Acquiescence score, positive or negative, 
might in fact be used as a check on the validity of a child's inven- 
tory. If we set +2 standard deviations as the limits, then allowing 
for the regression of Acquiescence on age, we find that 9 of the 218 
subjects are beyond these limits. Of these 9 subjects only 2 are above 
the age of 10 9/12 years. This indicates that extreme response set 
tends to occur more among the younger children, and that possibly it 
is due to lack of proper understanding of the nature of what is re- 
quired of them. However, until some external criteria of the validity 
of the personality measures derived from the inventory are estab- 
lished, it will not be possible to fix reliable limits to the allowable 
response bias. 

Turning to the question of the correlation between acquiescence 
manifest in the inventory, and suggestibility on the Body Sway test, 
the relationship seems fairly strong as far as the boys are concerned. 
Table 3 shows that whatever array of Body Sway scores we take, 
the correlation with acquiescence is positive. For the girls, however, 
the direction of the correlation is reversed for 3 out of 4 arrays. 

Two main factors contribute to the apparent differences between 
the boys and the girls: a) There are only small numbers of girl 
subjects in the arrays which do not involve the fall category 
(B) and the negative correlations are of little significance; b) The 
mean Aequiescence Score of the 5 girls who swayed backward is 
pean xp [л of those in the other two categories. 
and Body Sway oc ne positive correlation between Acquiescence 
aii p de i ul vns the array comprising both categories A 
Неге MN 2 ў A S jects excluding those who sway backward. 

Те ШЫ uS ү, ps is highest for the boys (r = 72, p < 01). 
Fa BEY pd a 1 category (B) as an extension of the for- 
Kids dd ), there is a positive relationship between 

The бла даш Suggestibility in both sexes. 
gestibility does not «x Ing backward sway as a measure of sug- 
ward above, that dau eee ranas ет " 

than those who sway { 0 Sway backward are even less suggestib | 
numbers aro involves EUR only a little. However, very smà 
gestive. It may be that am 1з study and the results are merely 818- 
ong children those who resist or are little 


affected by the Body Swa t 
i У suggestio: 1 sway 4 
little backward as a little farrier m ually well f 


H. B. GIBSON 745 


I Summary 


А 1. The Junior Maudsley Personality Inventory was administered 
‘to 218 children in the age range 8 10/12-14 years, and a score of 
"Aequiescence (positive response set) was caleulated for each inven- 
tory. 
2. The Body Sway test was administered to 76 of the subjects. 

3. It was found that Acquiescence declined with age to a signifi- 
cant degree. 

4, The Body Sway scores were compared with the Acquiescence 
Scores, the latter being corrected for the regression on age. It was 
found that for boys there was a significant positive correlation be- 
tween suggestibility, as measured by the Body Sway test, and 
Acquiesence as manifest in the inventory. The evidence for this as- 
Sociation was more equivocal for girls. 

5. It is suggested that the technique of calculating the Acqui- 
€scence component of self-rating inventories may be used as a device 
for investigating the validity of individual protocols. 

6. The present data throw further light on the question of the 
- ‘Significance of different types of response to the Body Sway test 
among children. 


REFERENCES 


Barnes, E. H. “The Relationship of Biased Test Responses to Psy- 
chopathology." Journal of iT ur and Social Psychology, LI 
- p, (1955) , 286-290. i Me 
ass, B. M. “Authoritarianism or Acquiescence? Journal of Ab- 
normal and Social Psychology, LI (1955), 616-623. — is 
Berg, I. A. “Response Bias and Personality: the Deviation Hy- 
pothesis,” Journal of Psychology, XL (1955), 61-72. in 
erg, I. A. and Collier, J. S. “Personality and Group Differences 2 
reme Response Sets.” EDUCATIONAL AND PsYCHOLOGIC! 
Р UREMENT, XIII (1953), 164—169. É 
Cohn, T. 8. “The Relation of do F Seale to a Response Set UR 
SWer Positively.” Journal of Social Psychology, XLVI ( , 
133 


Cronbach, L. J. “Further Evidence on Response Sets and Test De- 
Sign." EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, X 
(1950), 3-29. Hn 
Couch, A. and Keniston, K. “Yeasayers and Naysayers. Journ 
Abnormal and Social Psychology, Т1 (1960), IP ЖЫН 
ж, Н. J. “Suggestibility 3) "s Journal of New 
| sychi 2 ; 
ш, i, T Жыр үн Authoritarianism and Бону 
estionnaires." British Journal of Social and Сїйиса tsy 


chology, 1 (1962) , 20-24. 


746 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Frederiksen, N. and Messick, S. “Response Set as a Measure of 
Personality.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 
XIX (1959), 137-156. d 

Furneaux, W. D. “The Prediction of Susceptibility to Hypnosis.” 
Journal of Personality, XIV (1946), 281—294. 

Furneaux, W. D. and Gibson, H, B. “A Children's Personality In- 
ventory Designed to Measure Neurotieism and Extraversion." 
M M of Educational Psychology, ХХХІ (1961), 204- 

07. (a 

Furneaux, W. D. and Gibson, H. B. "The Maudsley Personality 
Inventory as a Predictor of Susceptibility to Hypnosis." Inter- 
national Journal of Clinical and Experimental Hypnosis, IX. 
(1961), 167-177. (b) 

Gage, N. L., Leavitt, С. S., and Stone, G. C. “The Psychological 
Meaning of Acquiescent Set for Authoritarianism." Journal of 
Abnormal and Social Psychology, LV (1957), 98-103. 

Guba, E. B. and Getzels, J. W. “The Construction of an Other-Di- 
rectedness Instrument with some Preliminary Data on Validity." 
American Psychologist, IX. (1954), 285 (Abstract). 

ое C. ы ш for Obtaining Separate Set an 

omponents o Score.” Psychometrika, XX 
(1957), 381-303. a Test Score.” Psychometrika, 
TD DN, vw cse and Solley, C. M. “How Rigid i the 
rian?” Jow of Ab md Soci sychology, 
LIV ( 1957), 137-140. f Abnormal and Social Psy gu 


Wis, N. A. and Taylor, J. A. "Anxiety and Extreme Response 
ferences, EDUCATIONAL AN 


к XV (1955), 111-116. D PSYCHOLOGICAL MEASUREMENT, 
к апа Frederiksen, N. "Ability, Aequiescence and ‘Au- 

Prentice Т" Psychological Reports, IV (1958), 687-697. 
шее, N. M. “The Comparability of Positive and Negative Items 


in Seales of Ethnic Prejudice.” social 
Psychology, LII (1956), 4204 poma of Abnormal and Soci 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


PERSONALITY TEST FAKING: EXPRESSED 
WILLINGNESS TO FAKE AS AFFECTED BY 
ANONYMITY AND INSTRUCTIONAL SET 


BERNARD RIMLAND 
U. S. Naval Personnel Research Activity, San Diego! 


THE current emphasis upon controlling faking on personality 
tests has provided little information about testee differences with 
regard to attitude toward faking, It is the purpose of the present 
paper to provide data on this point. The data were gathered by the 
clearly non-optimal but nevertheless very simple and expedient 
method of asking the subjects how willing they would have been to 
give biased responses, under operational circumstances, to the test 
they had just finished taking experimentally. The experimental cir- 
cumstances provided an opportunity for analyzing the data with 
regard to the effects of anonymity and prior instructional set. 

The single questionnaire item used for gathering the willingness 
to fake data appeared as the last question on an experimental forced- 
choice personality test being developed for use in à nationwide 
Scholarship program. The question asked groups of male students at 
а number of colleges directly if they would have been willing م‎ 
falsify on the test had they been taking the test operationally ze 
Competition for a four-year tuition-free college education under à 
Program in which they had already expressed an interest. The quee- 
tion, to be referred to hereafter as the “Fake?” question, 18 pre- 
sented with its five choices in Table 1. 5 

| Since the answer sheets were signed, these replies 
Sidered suspect despite assurances given the studen 


could be con- 
is that their 


3 ecessarily reflect 
"Тһе opinions and conclusions expressed herein do not n et ie NAVE: 


the opinions of the Bureau of Naval Personnel or the De 
747 


748 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 


Per Cent of Students Ezpressing Each of Five Degrees 
of Willingness to Fake on a Personality Test 


O e 070-67‏ ڪڪ 


If you were applying for one of the four-year scholarships and were given this 
questionnaire, how frankly would you reply on the questions? 


Retest Group Other Groups 
Response Unsigned Signed Signed 


N =158 N = 155 N = 878 
1. With complete frankness, 15 21 


23 
2. Might give myself benefit of doubt on a 
few borderline questions, but wouldn’t 
deliberately distort replies, 59 45 49 
3. I would try to give the best picture 
of er this was possible 
without going to extremes, 19 25 23 


4. Since I would be competing with other 
students who might give distorted re- 
plies in order to be awarded a scholar- 
ship, I very probably would feel it was 


necessary to give the best Possible 
Picture of myself also, 5 6 6 
5. Wouldn’t hesitate to give untrue re- 
plies if I felt it would help me. 2 1 1 
100% 100% 100% 


this time on an unsigned slip of paper. 
question, the response slips carried à 
ess and a personal guarantee of respond- 
ent anonymity by their instructor, 

1 The results of both administrations of the question are presented 
in Table 1, For comparison Purposes, the signed responses for 4 


randomly chosen group from five other colleges are also presented 
in Table 1. 


BERNARD RIMLAND 749 


couraged to admit falsification by the rationalization provided for 
them in Response 4. 

While the signed and unsigned columns substantially agree, a few 
students apparently did change their responses. (There is the pos- 
sibility that many students changed responses while the percentages 
remained constant, but this seems rather remote). The observed 
discrepancies could be attributed to either the “signed-unsigned” 
variable, or to the fact that the signed responses were requested at 
the end of an experimental test situation while unsigned responses 
were obtained separately later. Thus, in order to determine the 
effect of anonymity per se, further analysis was necessary. As a 
means of determining whether the immediately preceding test ex- 
perience was likely to have been a factor in changing responses to 
the “Fake?” question, the responses to the question were cross- 
tabulated, for the group of 873 students, with the instructions these 
subjects had followed in taking the experimental test to which the 
“Fake?” question referred. ; 

Five sets of instructions had been used in the original signed test- 
ing: Group I was instructed to respond honestly; Group П was told 
to fake; Group ПІ, “Fake, but beware of a ‘lie scale’”; Group IV, 
"Respond normally"; Group V, "Respond normally but beware of 
‘lie scale.’” Certain questions, including the one on willingness to 
fake, were to be answered honestly by all groups. In Table 2 results 
are shown for Group I (Honest), for Groups II and III combined 
(Fake), and for Groups IV and V combined (Normal). (Inspection 
shows the response data for the groups combined to be quite homo- 
Бепеоцв,) 

The data in Table 2 do not indicate for amy of the three groups 9 
major change in the conclusion regarding the extent of willingness to 
fake. However, the data do show a “carry-over effect” men 2l 
dents asked to respond frankly apparently tend to percas е 
Selves as being candid, as indicated by their selecting edet : 
Students asked to fake appear to perceive themselves as somewha 
more likely to fake under operational circumstances. The suggestion 
inherent in the test instructions thus affects both ends of the in- 
Structional continuum. 

The finding that the students’ expr 
Parently influenced by immediately preced 
(p« 01, by x?) precludes attributing the o 


essed willingness to fake is ap- 
ing test instructions 
pserved (though not 


750 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 


Willingness to Fake Cross-Tabulated with 
Membership in Experimental Group 


(N = 873) 
Instructions 
Response to Honest Fakes Normal» 
"Fake?" question (N = 130) (N = 271) (N = 472) 
1 (No) 35 12 22 
2 (Limited) 42 48 51 
3 (Limited) 18 27 22 
4 (Yes) 4 11 4 
5 (Yes!) 1 2 1 
100% 100% 100% 


Includes “Ansvar borea Secale" ow, N "group, N = 230, 
great) differences in Tesponses for the signed and unsigned conditions 
solely to the anonymity factor, since immediately preceding in- 
structional set as well as anonymity could have contributed to the 
differences found, 

This “carry-over effect? was also found by Sheldon (1959), who 
Teported that students who took a personality test under both a fake 
and honest get showed a Significant order effect. Students who re- 
sponded honestly first and then faked showed a greater change in 
score than those who faked first and then responded honestly. Re- 
lated findings are reported by Palermo and Martire (1960). 


to change by prior instruction-induced 
7 reted as reflecting the effects of sugges- 
К Self-perceptions, and it implies that great 


response set. This was interp 
tion upon the studen 


—Á'—— n S S 


BERNARD RIMLAND 751 


care should be exercised in wording test instructions and in design- 
ing experimental investigations of faking. 

The usual cautions regarding the specificity of these conclusions 
should be borne in mind. A very obvious restriction, but one which 
should perhaps be made explicit, is that verbally expressed willing- 
ness to fake, although anonymously expressed, may not be truly 
indicative of actual tendency to fake. The importance of this restric- 
tion is enhanced by the supplementary findings that some subjects 
were influenced, presumably without their awareness, by the in- 
structional set they had been asked to assume. 


REFERENCES 


Palermo, D. S. and Martire, J. G. “The Influence of Order of Ad- 
ministration on Self-Concept Measures.” Journal of Consulting 
Psychology, XXIV (1960), 372. хр 

Sheldon, M. S. “Conditions Affecting the Fakability of Teacher- 
Selection Inventories.” EDUCATIONAL AND PSYCHOLOGICAL MEAS- 
UREMENT, XIX (1959), 207-219. 


—XX -—— 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
VoL. XXII. No. 4, 1962 


THE RELATIONSHIP OF SELECTED PERSONAL 
CHARACTERISTICS TO THE NEEDS OF COLLEGE 
STUDENTS PREPARING TO TEACH 


KARL C. GARRISON 
University of Georgia 


AND 
MARY HUGHIE SCOTT 
National Educational Association 


A rather complete review of factors influencing students’ choice 
of teaching as a career is presented by Chaltas (1957). The review 
revealed the important role of idealism among students planning to 
teach. It was also noted that students were significantly motivated 
by social factors related to teaching. In a previous study the writers 
found that certain needs of prospective women teachers varied with 
different teaching areas (Garrison & Scott, 1961). However, the 
various studies dealing with motives related to the choice of teach- 
ing as a carcer fail to consider the relationship of certain background 
factors and specific personal needs of those choosing teaching as & 
career. 


The Problem 


The hypothesis of this study states that the specific personal needs 
of prospective women teachers (as measured by the Edwards Per- 
ntly according to selected 


sonal Preference Schedule) vary significa 

personal characteristics of the prospective teachers. The Edwards 
Personal Preference Schedule (EPPS) consists of 210 pairs of items, 
and is designed to measure the relative importance of 15 psychologi- 
cal needs. Relationships were explored between the needs of prospeo- 
tive women teachers and the following personal characteristics: (1) 
Marital status; (2) age; (3) college class standing; (4) number of 


753 


754 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


siblings; (5) ordinal of birth in family; (6) educational level of ' 
father; (7) educational level of mother; (8) occupation of father; _ 
(9) place of residence during high school; (10) number of years sub- - 
jects planned to teach; (11) major reason for college attendance; 
and (12) academic average. i 
The F-tests of the significance of the intergroup differences in 
means for each of the 15 personal needs scores of prospective women 
teachers were obtained. Complete data were available on 482 college 


students enrolled in the College of Education, University of Geor- 
gia. 


Marital Status and Age 


"Those subjects who were currently married or had been married 
manifested a greater need for achievement, deference, order, and _ 
endurance. Those who were single manifested a greater need for 
Succorance and heterosexuality. These comparisons are shown in 
Table 1. Closely related to these findings are the comparisons of 
age groups. Subjects who were oldest indicated a greater need for 


achievement, endurance, and aggression. The youngest subjects in- _ 
dicated a greater need for nurturance, | 


Е 


Married or Previously married ° Single Significant 
- (N = (N = 418) Results 
Achievement 

der 
Deference 51.68 
Suecorance 54.82 
Endurance 
Heterosexuality 55.03 51.43 


GARRISON AND 8COTT 755 


TABLE 2 
Comparison. of the Mean Needs of Subjects and Their College Class Standing 


Seniors and 

Sophomores Juniors Graduates Significant 

(N = 132) (N=177) (N = 173) Resulta 
Deference 50.65 50.20 52.04 8g » Jr 
Abasement 51.64 50.40 47.86 So > Sg; Jr > Sg 
Nurturance 54.97 52.78 50.74 So > Sg 
Change 51.14 52.05 54.35 Sg > 8o; Sg > Jr 
Heterosexuality 53.55 54.93 51.69 Jr > Sg 


ates indicated a greater need for deference and change, and a smaller 
need for abasement, nurturance, and heterosexuality. 


Years Planned to Teach 


Significant differences in mean scores were found between the 
groups of subjects based upon the number of years they planned to 
teach. These comparisons are presented in Table 3. Subjects who 
Planned to teach the shortest time (one to four years) manifested 
4 greater need for succorance and a smaller need for dominance, in- 
traception, and endurance. Subjects who planned to teach from one 
to four years and then stop showed a significantly greater need for 
Succorance and heterosexuality than those who planned to teach 10 
or more years (intermittently), 


Major Reasons for College Attendance 


The reasons given for college attendance were classified into three 
groups: (1) to teach, (2) to be informed, and (3) “others.” The re- 
sults presented in Table 4 reveal a number of significant differences 
in the mean scores, Subjects who came to college to prepare to teach 


TABLE 3 
Comparison of the Mean Needs of College Students and the 
Years they Plan to Teach 
14 5-9 се 10+ Conditional 
(N = 187) (N = 105) (N=187) (У = 97) 
Intraception 49.62 51.83 53.10 PH 
Succorance 55.53 53.34 51.78 50.73 
Finance 47.26 48.86 49.79 63794 
Ddurance 49.01 51.71 52.36 55.10 


Heterosexuality 55.19 53.87 51.33 


756 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 


Comparison of the Mean Needs of College Students 
and their Major Reason for College Attendance 


To Teach Be Informed “Other” 


(N = 190) (N = 246) (N = 46) Results 
Autonomy 46.67 48.74 51.07 O STIS 
Nurturance 54.00 52.26 49.13 T> OTIS 
Endurance 52.88 51.71 49.00 T»0 
Heterosexuality 52.08 53.77 56.74 OST 
ion 50.25 51.22 54.78 о> ТО 


indicated a greater need for nurturance and endurance. Those who 
came for varied reasons classified as "others" indicated a greater 
need for autonomy, heterosexuality, and aggression, and a smaller 
need for nurturance and endurance. The findings relative to endur- 
ance and heterosexuality are in harmony with those in Table 3, 
showing that those who planned to teach five or more years scored 


higher in endurance and lower in heterosexuality than those who 
planned to teach from one to four years. 


Academic Average 

The results presented i 
differences between th 
Students with higher 
for order, achievemen: 
ment and aggression, 


n Table 5 show significant and consistent 
€ groups with different scholastic averages. 
academic averages manifested a greater need 
t, and endurance, and a smaller need for abase- 


Some Family Background Factors 


Comparisons were made between several family background fat- 


tors and mean need Scores. Significant findings at the .01 or .05 level 
of confidence may be summarized as follows: 


ü М TABLE 5 
omparison of the Mean Needs of Subjects and Their Academic Average 


Academic Average 


60 70 80 Lu 
Sees) (uem) оно) 
m am cw 
s B ә »4 . 
Achievement 45.50 ed 5 2 52.73 
шеп 53.28 50.66 48.81 48.39 
m urance 46.94 50.81 52.55 57.7 
ggression 53.92 50.5 


51.98 50.50 49.33 


GARRISON AND SCOTT 757 


£o 


. Ranked means on exhibition need score and place of residence 
while in high school: rural, 47.06; small town, 50.29; large 
town, 50.02; city, 52.37. Significant results O > R;C Lt 

b. Ranked mean on autonomy need score and place of residence 
while in high school: rural, 51.13; small town; 47.05; large 
town, 47.45; city, 48.55. Significant results R > St; R > Lt. 

¢. Ranked mean on dominance need score and occupation of 
father: lowest, 45.94; middle, 48.83; highest, 50.17. Significant 
results: H > L; M > L. 

d. Ranked mean on abasement need score and occupation of 
father: lowest, 54.16; middle, 50.16; highest, 48.70. Significant 
results: L > M; L > H. 

e. Ranked mean on autonomy need score and father’s education: 
grammar school, 51.24; high school, 45.87; college, 48.64; grad- 
uate school, 45.67. Significant results: gram > grad; gram > 
H; O >H. 

f. Ranked mean on intraception need score and ordinal of birth 
in the family: first, 52.92; second, 52.02; third, 48.90; fourth or 
later, 54.83. Significant results: 4 > 3; 1 > 3;2>3. 

& Ranked mean on abasement need score and ordinal of birth in 

the family: first, 48.87; second, 50.52; third, 52.86; fourth or 

later, 49.04, Significant results: 3 > 1. 


Conclusions 


Out of 180 F-tests, 38 were significant, Of this number, 23 were 
Significant at the one per cent level of confidence, and 15 at the five 
Per cent level, The number of significant F scores varied markedly 
tom one personal characteristic to another, 


The following generalizations may be drawn from the data pre- 
sented: 


1. Fourteen of the 15 personal needs varied according to hae? 
More personal characteristics of the subjects. Bos e 
affiliation was unrelated to any of the twelve personal ¢ 
Acteristics 

» E. "i 
* Two of the twelve personal characteristics (namely, T 
Siblings and education of the subjects’ mothers) were unrel 
to any of the 15 pe: 1 cl teristics. 
ап) е 15 personal charac : MA 
- Significant findings may be observed in the needs of 


758 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


lapping groups such as single students, third quarter sopho 
mores, and younger students. 


It should be noted that the Edwards Personal Preference Schedule 


forced choices are sometimes meaningless if not ridiculous. Thi 
trends noted from an analysis of the results might well be more 
marked if a more valid measure of needs were available and used. 


REFERENCES 


Chaltas, John G. “Factors Influencing the Choice tea 
Teaching Career—A Review and Analysis of the Literature.” 
pees Doctors Dissertation, Teachers College, Columbia 

niversity, 1957. A 

Garrison, Karl C. and Scott, Mary Hughie. “A Comparison of the 
Personal Needs of College Students Preparing to Teach in Differ- 
end Teaching Areas.” EDUCATIONAL AND PSYCHOLOGICAL MEAS- 
UREMENT, XXI (1961), 955-964. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


A NOTE ON RELATIONSHIPS BETWEEN THE YALE 
ANXIETY AND LIE SCALES 


F. N. COX 


University of Melbourne 
Victoria, Australia 


Sarason and his associates (1960) have recently published a de- 
tailed report of their findings concerning the nature and correlates 
of test anxiety in elementary school children. Their three main 
measures have been the Yale General and Test Anxiety Scales for 
children, and a Lie Scale which is incorporated into the general 
anxiety questionnaire. The purpose of this note is to examine rela- 
tionships between these three scales. 


Procedure 


Data for this report were obtained in the course of two other in- 
vestigations (Cox & Leaper, 1959; Cox, 1962). 

The purpose of the first study was to adapt the Yale Anxiety 
Scales for use in Australia. Slightly modified! forms of the ques- 
tionnaires were administered to a large, representative? sample of 
3rd, 4th, and 5th grade Melbourne children. The final effective 
Sample comprised 848 cases. 

Tn the second study the main aim was to investigate the effects 
that educational streaming practices have upon scores on the Yale 
Scales. The questionnaires were administered to a sample of 266 


* The only alterations made to Sarason’s original items involved E 
Australian for American idiom. In all, there were seven minor changes, 
Which were made to the General Anxiety Scale. _ А 1 
е sample specifications included representative үр р ci н 
ment, Roman Catholic, and other types of private schools, and rep! 
Proportions of schools in industrial and residential suburbs. 


759 


760 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


children attending 4th and 5th grade elementary, middle class 
schools in Canberra. 

For the purpose of the present report, these two samples were 
broken down by grade and sex, and product-moment correlation 
coefficients between the three scales were computed by SILLIAC 
(ILLIAC at Sydney). 


Results 


The correlation coefficients are reproduced in Table 1. If rows 1 
and 2 are compared, it can be seen that the size of the correlation 
coefficients between Test and General Anxiety on the one hand, and 
"Test Anxiety and Lie Scales on the other, are remarkably similar 
—the only difference being a completely systematic one in sign. 
This suggests that there is a very high, negative correlation between 
Scores on the General Anxiety and Lie Scales, and this is confirmed 
by inspection of the coefficients in the third row. 


Discussion 


Before an attempt can be made to interpret these relationships, it 
ae necessary to refer briefly to the format of the questionnaires. 
both anxiety scales the words YES and NO appear opposite the 


their order ig invariable, Sarason, et al. (1960) report that they were 
in using a format in which a “yes” 


* . TABLE 1 
orrelations between Yale General and Test Anziety Scales and Yale Lie Scale 


Melbourne Samples Canberra Samples 


3 
rd Grade — 4th Grade Sth Grade 4th Grade 5th Grade 


- BENE тес ол 
a aa nu Girls Boys Girls Boys Girls Boys Girls 
Seales — 1% ) di Um (N = (N = (N = (N= (N = 


) 127) 159) 138) 162 1 74 57) 67) 
EE : ) 130) 74) 68) 57) 


General 
Test nx. es a EN NC 
od 78 —.59 —.39 — 54 — 59 —.57 —.37 —.57 —.58 —-59 
Anx. & 
Lie тШ a - ы о в-и 


F. N. COX 761 


answer always implied anxiety, but that the most simple of formats 
seemed to be necessary if the questionnaires were to be administered 
to groups of young children. 

The lie scale contains 11 items scattered throughout the General 
Anxiety questionnaire. These items were selected "to fulfill the re- 
quirement that they refer to experiences which are essentially uni- 
versal among children: that is, all children, if they were able to re- 
port their experiences without distortion, would answer ‘yes’ to most 
if not all of the items” (Sarason, et al., 1960, p. 112). 

Given this format, it appears reasonable to argue that the cor- 
relations presented above, particularly the very high negative ones 
between scores on the General Anxiety and Lie Scale, may well be a 
function of acquiescent and position response sets. It should also be 
noted, however, that scores on both anxiety scales have been shown 
repeatedly to be associated significantly, and in theoretically 
predictable ways, with independent measures of the quality of chil- 
dren’s performance on a wide variety of intellectual and non-in- 
tellectual tasks (Cox, 1960; Cox, 1962; Sarason, et al, 1960). Con- 
sequently, it seems reasonable to conclude that, while response sets 
may be operating to an unknown extent in the General and Test 
Anxiety Scales, there is evidence that each measure has some ex- 
ternal validity. 2 

In the case of the “lie scale,” the only possible conclusion which 
can be drawn from the results presented here was, in fact, suggested 
by Sarason and his associates in Appendix А of Ansiety in Elemen- 
tary School Children: “when the lie scale is used as а continuous 
variate it operates . . . like a negatively scored anxiety questionnaire, 
in which the number of ‘no’ answers instead of the ‘yes’ answers are 
counted” (Sarason, et al., 1960, p. 300).3 It is suggested, then, that 
all other interpretations of the alleged “lie scale” have to be inter- 
preted in terms of this general conclusion. 


——— 

s Sarason and his associates have ою ре 
Heir anxiety and lie scales, but the coefficien j Н 

than those de ranging, in the case ot we aine nen ноут 
from r = — 16 to — 50, and for general anxiety and lie 800709 Thm e is ap- 
to — 66 (Sarason, et al, 1960, p. 113). One possible uiam Professor C. A. 
Parent cultural difference, which was sug to the mE may have a 
ibb, is that the word “ever,” which is used in allis i ол children. 
more unequivocal connotation for American than Australian sc 


tive correlations between 
te are consistently lower 


762 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


REFERENCES 


Cox, F. N. and Leaper, Patricia M. “General and Test Anxiety 
Seales for Children.” Australian Journal of Psychology, XI 
(1959), 70-80. 

Cox, F. N. “Correlates of General and Test Anxiety in Children.” 
Australian Journal of Psychology, XII (1960), 169-177. 

Cox, F. N. “Educational Streaming and General and Test Anxiety." 
Child Development, XXXIII (1962) , 381—390. 

Sarason, S. B., Davidson, K. S., Lighthall, F. F., Waite, В. В. and 
Ruebush, B. К. Anziety in Elementary School Children. New 
York: John Wiley & Sons, 1960. 


we r—— ن‎ НЕС ةا س‎ 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


VALIDITY STUDIES SECTION 


Edited by 
WILLIAM B. MICHAEL 
University of California, Santa Barbara 


The Metropolitan Readiness Tests as Predictors of First-Grade 
Achievement. BLYTHR C. Мітснюш,.......... 
The Concurrent and Predictive Validity of an Objective Meas- 
ure of Academic Self-Concept. Divi A. PAYNE ........... 
The Validity of a Battery of Creativity Tests in a High School 
Sample. Vicrog B. CLINE, James M. RICHARDS, Jm, AND 
CLIFFORD Ави А 
The Individual High School as a Predictor of College Аса- 
demic Performance. REGINALD L. JONES AND LAURENCE 
SIEGEL ......... «5 9 E a ees И 
The Concurrent and Congruent Validities of the Wide Range 
Achievement Test. KENNETH D. HOPKINS, James C. Dos- 
SON, AND О. A, ОпюврбЕ............ жеее н е, 
Predicting Grade Point Average at a Small Southern College. 
Mary CATHARINE Vick AND JOHN A, HoRNADAY .......... 


763 


795 


ANNOUNCEMENT REGARDING VALIDITY STUDIES 


The VALIDITY STUDIES SECTION is published twice a year, 
once in the Summer issue and again in the Winter issue, for which 
the elosing dates for receiving manuscripts are February first and 
August first, respectively. Although articles between two and eight 
printed pages are usually preferred, an occasional exception is made 
to publish articles of somewhat greater length. 

Considerable flexibility exists concerning format as can be seen 
from a study of recently published articles. However, the model 
Presented in the Spring, 1953, issue of EDUCATIONAL AND 
PSYCHOLOGICAL MEASUREMENT still represents a close ap- 
proximation to what is customarily published. Reprints of this model 
study are still available. The prospective contributor is encouraged 
to read the original announcement. 

In order that the usual number of articles of other types may 
hot be reduced, it is necessary to enlarge the journal and to charge 
the authors for most of the publishing costs. For a running page 
of printed text the cost is fifteen dollars per page with extra charges 


for tables and complex material. Each author receives 100 free re- 
prints. 


Manuscripts should be sent to 
William В. Michael 
Professor of Education and Psychology 
University of California, Santa Barbara 
University, California 


- O a a 
"—  —-—————É "I M UI 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


THE METROPOLITAN READINESS TESTS AS PREDICTORS 
OF FIRST-GRADE ACHIEVEMENT! 


BLYTHE C. MITCHELL 
Harcourt, Brace & World Inc. 


Purpose. The purpose of this study was to investigate the predie- 
tive validity of the Metropolitan Readiness Tests against the 1959 
Revision of the Metropolitan Achievement Tests as the eriterion 
measure, 

Predictor variables. The six subtests of the Readiness battery, 
with the number of items indicated, are as follows: 1.Word Mean- 
ing (19), 2.Sentences (14), 3.Information (14), 4.Matehing (19), 
S5. Numbers (24), and 6.Copying (10). The validities of Tests 1-4 
combined (66 items), of Test 5, and of the Total test (100 items) 
are reported. Tests 1-4 are generally considered as predicting readi- 
hess for reading; Test 5, readiness for arthmetic. à 

Criterion variables. 'The Primary I Battery of the Metropolitan 
Achievement Tests is designed for use at the end of the first grade. 
Four separate measures are provided: 1.Word Knowledge (35 
items), 2 Word Discrimination (35 items), 3.Reading (45 items), 
and 4Arithmetie Concepts and Skills (63 items). An assessment of 
total reading achievement is provided by the average of Tests 1-3. 

Subjects. Complete results on predictor and criterion tests were 
available for 1170 pupils in the white and negro schools of an entire 
Virginia county for the school year 1959-60. In order that observed 
correlations between predictor and criterion would not be A 
(raised) by the greater variability of a combined group, тезш 
Were obtained separately for white and negro schools. This report is 
= 


А of 
"The basic data for this study were provided by ү qe О. 
Tincess Anne County, Virginia. Appreciation 18 bee t ER 
Jr, Director of Testing and Research, for making the data 


765 


766 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


concerned largely with the 919 pupils in the white schools of the ` 
county. 

Procedure. The Readiness tests were administered during the 
month of September, 1959. Performance was expressed in raw score 
terms and, for the total of Tests 1-6, as Readiness Status categories 
E (low) through A (high). These categories are set up to contain 
successively 7%, 24%, 38%, 24% and 7% of the distribution of 
scores for the national standardization group. 

The Achievement tests were administered the following May 
(1960), with results expressed in terms of grade equivalents. 

Results. The predictor-criterion relationships are shown as prod- 
uct-moment correlation coefficients (Tables 2-4) and as bivariate 
charts (Tables 6-9). 

Table 1 gives reliability data on the two instruments as reported 


TABLE 1 
Reliabilities of the Predictor and Criterion Instruments 


Metropolitan Readiness Tests Metropoli i 
politan Achievement Tests 
(Alternate-form) (Split-half) 
tae .83 Test 1. Word Knowledge .90 
Toul .84 Test 2. Word Discrimination 87 
otal, Tests 1-6 89 Test 3. Reading 92 


Test 4. Arithmetic Concepts 
in the respective test manuals, Each of the values shown for the 
Metropolitan, Readiness Tests represents the median of alternate- 
form determinations for six separate groups averaging 195 pupils 
each, The reliabilities for the Metropolitan Achievement Tests were 
obtained by the split-half method (raised by Spearman-Brown) and 
represent the median of determinations in four school systems, with 


deviations for all variables are a] 


in the county, The coefficients are 


BLYTHE C. MITCHELL 767 


TABLE 2 


Predictive Validity of Metropolitan Readiness Tests as Found for the 919 
First-Grade Pupils in the White Schools of a County System 


Correlation with Metropolitan Achievement Tests 
(1959 Edition, Primary I Battery) 


2. Word Average 4. Arith- 
Metropolitan 1. Word Dis- Reading metic 
Readiness Knowl- crimi- 3. Read- (Tests Concepts 
Tesls edge nation ing 1-3)  &Skils Mean! S.D! 
Tests 1-4 . 467 .462 .427 .482 544 53.4 9.3 
Test 5 . 563 .581 ‚512 ‚589 .622 i61 6;1 
Tests 1-6 
(Total) . 558 ‚557 ‚511 .978 .632 75.1 15.6 
Mean? 1.87 1.99 2.01 1.96 2.31 
Standard 
Deviation? .44 .61 .59 .52 .61 


1In terms of raw score, 
* Grade equivalents, Norm for date of administration is 1.8. 

Sex difference in readiness. The usual relation between the readi- 
ness of boys and that of girls is reaffirmed by the present data for 
the pupils in the white schools of the County. The mean Total 
Readiness score of the 488 boys was 73.92, while that of the 431 
girls was 76.42. This difference of 2.5 in the means is significant at 
better than the .05 level, indicating that the entering girls te 
ready for formal learning than are the entering boys. The difference, 
though statistically significant, would appear to be of minor prac- 
tical significance, however. The boys’ mean score of 73.9 would have 
à percentile rank of 58, and the girls’ mean of 76.4 one of 64, n the 
distribution of individual scores for the combined boy-girl national 
standardization group. 


TABLE 3 
itan 
Boy-Girl Comparisons for Correlation between Total Score on Metropol 
Readiness Tests (September) and Grade Equivalents on M. rici um epi 
Tesis (May) for the 919 Pupils in the White Schools of a Virgt 
S - - 


2 Metropolitan Achievement Tests (1959 Edition) 

| 4. od 

| Number 2. Word re, e S 
| of 1. Word  Diserimi. 3. Read- Reading) cere 

| Group Pupils Knowledge nation ing ез 

PEE veer ORE 

Е qus 488 550 535— 4 565 дз 


768 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 | 


White-Negro Comparisons for Correlation between Total Score on Metropolitan 
Readiness Tests (September) and Performance on Metropolitan Achievement 
Tests (May) for the 1170 First-Grade Pupils in a Virginia County | 


Metropolitan Achievement Tests (1959 Edition) 


4, Arith- | 
Number 2. Word (Average metic Con 
of. l. Word Р"івсгіті- 3. Read- Reading) cepts and 
Group Pupils Knowledge nation ing Tests 1-3 Skills _ 
White 
Pupils 919 .558 557 ‚511 .578 -632 
Negro 
Pupils 251 .553 466 .A754- .583 617 


ae AYE .583 f 

The standard deviation of the Total score distribution for boys: 
was 15.82, that for girls 15.14. | 

Readiness апа achievement vs. CA. In contrast to the correlations 
of .51 to .63 between Readiness test scores and achievement, as 
shown in Table 2, Table 5 shows the relation between chronological 
age and end-of-first-grade performance on the Metropolitan Achieve- 
ment Tests. The highest r for the modal-age group (.091) just 
reaches the .01 level of significance. It appears that within the 
twelve-month age range for first-grade entrance achievement is 
only very slightly related to age differences. | 

The relation between CA and the October score on the Metropoli- - 
tan Readiness Tests was 126 for the entire age range, .146 for the 


Р TABLE 5 
ausa ез Chronological Age and End-of-First-Grade Performance 
elropolitan Ach evement Tests for the 919 Pupils in the 
White Schools of a Virginia County 


Correlation with Chronological Age 


| ; ‘or CA's 5-11 
ML A For entire CA range, For CA's 5- iy! 

Tests (1959 Edition) E p aa dag) 
Test 1. Word Knowl 7 
Test 2. Word DR Ru “to 
Test 3. Reading E Cs 
Test 4. Arithmetic Concepts E ‘ 

and Skills —.001 087 


BLYTHE C. MITCHELL 769 


modal-age range 5-11 through 6-11 only. Thus СА differentiation 
within the year's span of normal entrance age is shown to be a 
relatively poor measure of the degree of readiness for learning, and 
of little use as a predictor of first-grade achievement. 

In Tables 6 through 9 the relation between Readiness Status, as 
determined by Total score on the Readiness tests, and end-of-the- 
grade achievement is shown in the form of bivariate distributions. 
The first-grade “success” of pupils in each September Readiness 
Status category is shown by the frequencies (and percentages) at 
each 3-month grade-equivalent interval. In the lower rows of the 
table, the average achievement (median grade equivalent) for each 
readiness category and the proportion reaching the grade norm of 
1.8 are shown. 

The five Readiness Status categories, E through A, correspond 


TABLE 6 


Distribution of May Grade Equivalents on Test 1, Word Knowledge, 
for Pupils in Each September Readiness Category 


Readiness Status in September of First Grade 


Grade r] 
Equivalents E D B 
on Word Poor Low С High s 


Knowledge Risk Normal | Average | Normal | Superior 
Metropoli- 


tan Test 1) | f (%)| f (%)| f (| t (9| f (90) | Total 


3.0-3.2 1 9 (3)| 14 (10) | 24 
2.7-2.9 2 (1)| 17 (6)| 33 (23)| 52 
2.4-2.6 4 (2)| 19 (7)| 30 (1)| 25 (18)| 78 
2.1-2.3 1 (1)| 8 (3)| 30 (11) | 14 (10) | 53 

„ 1.8-0 1 (4)| 25 (14) | 93 (32) |118 (42) | 40 (8) BU 
1.5-1.7 | 5 (19)| 98 (56) |146 (50)| 73 (26)| 14 (10) |337 
L214 |15 (58) 35 (20) 20 (7)| 7 (2| 1 (D| 78 
Bdowi2 | 6 (22)| 12 (7| 3 (1) 


_ |] | md اا ي‎ 
К Total 27 (101) | 175 (100) | 292 (101) | 284 (101) | 141 (100) | 919 
| Set i EE 
: Median 
rade 
Equivalent 1.3 1.6 17 1.9 2.4 1.8 
Per Cent 
aching 
(or Exceed 80% | 52.796 


i у 
“aiken | و‎ | и» | ex | э» ol КЕ == 
о 
Teach (Lis dotted line divides those who scored at or above norm (1.8) from those w 
the norm on the May achievement test. 


770 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 7 х 
Distribution of May Grade Equivalents оп Test 2, Word Discrimination, 
for Pupils in Each September Readiness Category 


Grade Readiness Status in September of First Grade · | 


каакы шшш ы I—— —| 
on Word B 
Discrimi- High A 
nation Normal | Superior 
(Metropoli- 
tan Test 2) f (%)| f (%)| Total 
3.9 and over 1 2 АСТЕ 
3.6-3.8 10 (4)| 18 (13)| 31 
3.3-3.5 
3.0-3.2 11 (4)| 23 (16)| 42 
2.7-2.9 20 (7)| 21 (15) | 48 
2.4-2.6 69 (24) | 28 (20) | 124 
2.1-2.3 53 (19)| 20 (14) |111 
18-20 62 (22)| 18 (13)|189 . 
15-17  |3 (п)| 39 (22) | 88 Q9) 38 (13)| 6 (4|19 
12-14 19 (7)| 5 (4) |191 
Below 1.2 1 1 
Total 284 (100) | 141 (100) | 919 
Median 
рн 
iquivalent ‘ ‘ 1.9 
Per Cent aA gi 
Reaching 
Norm 49% 80% 92% 59.6% 
* This dotted line divides th 


oae wh failed to 
reach the norm on the May achievement, ч at or above norm (1.8) from those who fai 


successively to the Total score ranges. 0-39, 40-64, 65-79, 80-89, 
and 90-100. | 

The significant information to be gained from each table is il- 
lustrated by the following statements regarding Table 6, in which 
the criterion is the Word Knowledge subtest: 


Of the 27 pupils who made a Total Readi below 40 
eadiness score belo 

and wefe thus classified as “Poor Risks” for learning, only one | 
reached the national norm of 1.8 at the end of year. Of the 175 
with Readiness scores classified as “Low Normal,” 30, or about 
oa ош of six, reached or exceeded the national norm. Of the 
202 with “Average” Readiness scores, 42% achieved at 1.8 oF 
above; and for the 284 “High Normal” and the 141 “Superior,” 
the per cents were 72 and 89, respectively. 


BLYTHE C. MITCHELL 771 


The median May grade equivalent of the cases classified by 
the Readiness tests as “Poor Risks” was 1.3, with the succes- 
sively more “ready” pupils averaging 1.6, 1.7, 1.9 and 2.4 (mid- 
dle 2nd grade), respectively. 


Summary. Test results for 1170 first-grade pupils in a county 
school system show the Metropolitan Readiness Tests to be good 
predictors of first-grade learning. Correlations of Total Readiness 
Score as à predictor with achievement on each of the four subtests of 
the Metropolitan Achievement Tests as the criteria range from .51 
to .63. No significant differences in validity between boys and girls 
or between white and negro pupils were found. 

Bivariate charts show a considerable degree of relationship be- 


TABLE 8 4 
, Distribution of May Grade Equivalents on Test 3, Reading, 
for Pupils in Each September Readiness Category 


Readiness Status in September of First Grade 


Grade E 
Equivalent Poor Low С 
on Reading Risk Normal | Average | Normal 
(Metropol | —— —— |. A 
tan Test3) | f (%)| f (9| f (90 
E l1 9 nM 


1 11 
0 i 
5 (2) 36 
1 (1)| 5 
з (2| 6 (2) n 
6 (3)| 12 (4) 01 
16 (5) 260 
ү 2 (7)| 34 (19) |102 (85)| 98 (35)] 2 клр, * 
Usi ln S tee ee 315 
16 (59)|103 (59)|128 (44) 
4 (15)| 22 (13)| 16 (5) " 


5 (19)| e (9| 2 (D 
27 (100) | 175 (100) | 292 (99) | 284 (100) 


Grade 
Equivalent 
er Cent 
Reaching 


those who failed to 


* This dotted line divides those who scored at or above norm (1.8) from 
"tach the norm on the May achievement test. 


772 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 9 


of May Grade Equivalents on Test 4, Arithmetic Computation 
Skills, for Pupils in Each September Readiness Category 


and 


Equivalent Readiness Status in September of First Grade 
on Artthme- а 
tic Computa- B 
tion and High n 
Skills (Met- Risk Normal | Superior 
ropolitan ا ا ا‎ 
Test 4) f (9| f (90 
3.9-3.94- 329095406: (4) 
3.6-3.8 v (2) 17 (12) 
3.3-3.5 
3.0-3.2 
2.7-2.9 
2.4-2.6 
2.1-2.3 
1.8-2.0 


енн 
bien 
2799 


х ale dotted line divides th iled | 
reach the norm on the May токов at or above norm (1.8) from those who fi 


tween the five Readiness Status categories and end-of-year grat 
equivalents, Less than ten per cent of the October “Poor Ris 
poe the grade norm in Мау, and less than ten per cent of thos 
of Superior” readiness status failed to reach it. 
d i : Readiness tests would appear to be a useful instrument J 

etermining the degree of readiness for first-grade learning. +} 
results may serve (a) as guides in homogeneous grouping for differ 


entiated instruction, and (b) as suggestive of the types of readine 
development needed by various pupil groups. | 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


THE CONCURRENT AND PREDICTIVE VALIDITY OF AN 
OBJECTIVE MEASURE OF ACADEMIC SELF-CONCEPT! 


DAVID A. PAYNE? 
Syracuse University 


Tur prediction of achievement criteria with noncognitive vari- 
ables has generally met with failure. Uhlinger and Stephens (1960), 
and Lowell (1952) report no significant relationship between Me- 
Clelland’s TAT motivational measure and grade point average 
(r = .05). Similar findings have been reported for the Achievement 
Seale of the Edwards Personal Preference Schedule (Bendig, 1958; 
Shaw, 1961). Chahbazi (1960) found that several experimental 
Projective tests were fairly effective predictors of first-term GPA's, 
but not of cumulative averages. Studies using anxiety (Grooms & 
Endler, 1960) and general personality measures (Witherspoon & 
Melberg, 1959) have yielded essentially negative results. In the 
latter study, using the GPA’s of 229 first semester freshmen, only 
three out of eleven possible correlations were found to be signifi- 
cantly associated with the criterion (p < .05). 

It is likely that the inability of most “personality” instruments 
to predict achievement lies in 1) their attempts to assess too many 
aspects of behavior, 2) the failure of most instrument items to have 
even remote reference to academic behavior, and 3) the lack of a 
theoretical base either for instrument selection or development. The 
importance of theory in assessment has been forcefully demonstrated 


by Guba and Getzels (1955). 


1 of a larger project, “A 
Data for the present study were ошена teea hiere ok 


Comprehensiy svatianal 

hensive Study of the Motivationa m pid 

eventh Grade Students,” under the direction of William W. AU. A 
acted at Michigan State University рш E, a contract 

ce of Education (Co-operative Project No. ў NI tate Uni- 
Formerly with the Bureau of Educational Research at Michigan 8 

Yersity where this research was conducted. 


773 


2 


774 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The purpose of the present research was to investigate the effici- 
ency of an objective theory-based measure of academic self-concept 
in predicting an achievement criterion. Further, account will be 
taken of the caution recently expressed by Kazmier (1961). He 
found a multiple correlation of .345 between certain scales of the 
EPPS and grades using the Wherry-Doolittle test selection method. 
The multiple correlation degenerated to —.058 upon cross-valida- 
tion, using the beta weights obtained from the validation group. The 


importance of using cross-validation groups in predictive studies 
is obvious. 


Methodology 

Instrumentation 
The academic self-concept measure of the present study consisted 
of a 48-item rating scale, the Word Rating List (WRL).? Items were 
in the form of one, two, or three words or concepts which the student 
was to rate on a four-point scale as he thought his teachers would 
describing him as a student. This procedure brought into play the 
looking-glass" self-concept of traditional social psychology as re- 
cently expanded by Brookover (1959). Assuming that self-concept 
yoran intervening variable, item discrimination was determined by 
contrasting statistically-defined under- and overachieving (high and 
tog motivated) students. Forty-eight items remained after cross- 
Validation for each sex, with 35 in common, Instrument reliability 
"ра ranged from .88 to .93 for various samples of males and 
vo ot d multiple sealogram analysis indicated that the 


WRL is essentially a mul i-factor instrument (Payne, 1961; Payne 
& Farquhar, 1962). 


Sample Selection 


Four independent sets of TAR 
sampl tified for 
study. Two sets of sampl ples for each sex were identifie 


es to serve for study of concurrent validity 

(hereafter referred to as 8 Ў 
eaf ampli * dy of 
predictive validity (Sam Ko 


Ban 3 ples III and IV). 
oscar used in the determination of concurrent validity were de- 


Investigation Into the Relationship of Socio" 


| 
| 


1 


DAVID A. PAYNE 775 


rived from an original population of approximately 4200 eleventh 
grade males and females, by use of the Two Stage Regression Model 
recently discussed by Farquhar.’ After subjects were eliminated 
from the population who fell beyond -- 1 S.E.est relative to the re- 
gression of a first to second administration of two different aptitude 
measures, the following groups were delineated: 

OVERACHIEVERS—Subjects falling at or above 1 S.E.est rela- 
tive to the regression of aptitude (Diferential Aptitude Test, Verbal 
Reasoning subscale) on achievement (cumulative grade point aver- 
age for the 9th and 10th grades) .° 

UNDERACHIEVERS—Subjects falling at or below 1 S.E.cst. 

ACHIEVERS—Subjects falling within + 14 S.B.est. 

GENERAL—This group was composed of a random normal sam- 
ple. 

Two samples (I and II) of the above four groups for each sex 
were identified to achieve replication of the zero-order correlations 
between the WRL and GPA. It was felt that by sampling from four 
achievement classifications, a picture of the concurrent validity of 
the WRL would be formed, which would represent the full range of 
individuals with which the instrument might eventually be used. 

Data for the determination of predictive validity were gathered 
from two random normal samples of each sex (Samples III and IV). 


Analysis Procedures 

Zero-order product-moment correlations were considered evidence 
of concurrent validity. ‹ 

Predictive validity was determined in the following manner: 1) 
multiple regression equations were calculated using the Fisher 
modification of the Doolittle method (Walker & Lev, p. 331), 2) 
based on the beta weights from one sample, GPA predikon were 
made for a second sample, 3) actual and predicted GPA's were 
correlated. An adaptation of the F test for significant differences " 
multiple correlations (MeNemar, p. 279) was used to determine i 


— OT MM EN БУРЫШ eee 


asure of Motivation—The Michigan 


Economie Status to an Objective Me че 0С МОР 6 МОЕ 


Stato M-Scales,” under the direction of 
State University. i i 
^ Piper Ted i the 1961 meetings of the American Гено ва е 
Association in Denver, “А Comparison of bcr ko ko: 
Under- and Overachieving Students." (Mimeographe int scale, was estimated 
* Reliability of the achievement criterion, on естт Y 
to have a median value of .75 for males and 80 for females. 


776 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the addition of the WRL yielded a significant increase in prediction 
of GPA, over single aptitude measures. This adaptation involved 
simply treating the zero-order correlation (Aptitude vs. GPA) as 
one of the multiple correlations.” 


Results and Conclusions 


No significant differences in aptitude are noted in the data pre- 
sented in Table 1. This observation holds not only between samples, 
but also across sexes. Differences in the GPA data in Table 2 are in 
the expected direction. Differences in WRL mean scores are also in 
the expected direction. Homogeneity of variance is evident through- 
out the data presented in Tables 1-3. 

The correlations presented in Table 4 indicate, with one exception, 
strong positive relationship between the WRL and GPA. Somewhat 
erratic results were obtained from both male and female under- 
achieving samples. The obtained results for these discrepant 
achievers might be explained on the basis of heterogeneity of un- 
controlled variables or errors in sample selection. 

Out of 16 potentially signifiant correlations, 15 were signifi- 


cantly different from zero (p < .05). Of these, 12 were significant at 
the .01 level. 


TABLE 1 
Means and Standard Deviations of Differential A pti й 
1 plitude Test (Verbal Reasoning: 
sub-test) for High School Samples Used in Determining Concurrent 
Validity of Word Rating List (after McDonald) 


Description Y Males Females 
Overachievers X S.D. N XxX S.D. 
Бард 76 21.71 
: 9.35 83 21.46 8.18 
Sample II 72 2232 991 81 21.27 808 
оона 
ample І 61 20.56 
E 8.12 73 21.41 8.03 
EIE 55 20.38 765 63 19.97 6.11 
Achievers 
Sample I 50 1 
9.52 9.07 50 19.94 8.45 
Sample II 50 2102 7.97 50 20.26 7.08 
General Group 
Sample I 123 19.3 
-37 8.32 119 20.92 8.59 
Sample IT 15 19.37 833 124 19.91 7.50 


"The validity of this ad 


aptati e in per- 
sonal correspondence. Plation was corroborated by Dr. McNemar in pe 


DAVID А. PAYNE т 


TABLE 2 


Means and Standard Deviations of Grade Point Averages for High School 
Samples Used in Determining Concurrent Validity of 


Word Rating List (after McDonald) 
Description Males Females 
N X S.D. N X S.D. 
Overachievers 
Sample I 76 3.92 .58 83 4.09 .50 
Sample II 72 3.96 .66 81 4.14 .53 
Underachievers 
Sample I 61 2.23 .52 73 2.05 .48 
Sample II 55 2.18 .49 63 2.50 .42 
Achievers 
Sample I 50 2.85 .64 50 3.26 .49 
Sample IT 50 2.96 .56 50 3.26 .55 
General Group 
Sample I 123 2.88 .72 119 3.00 .72 
Sample IT 118 208 .72 14 3.26 .65 


In general it is felt that а high degree of concurrent validity has 
been demonstrated. This is particularly true when viewed in light 
of the essentially negative results summarized in the opening para- 
graph of the present study. 

Descriptive data for variables used in the predictive phase of the 
present study are presented in Table 5. With three exceptions, simi- 
larity in data is again noted between samples and across sexes. A 


TABLE 3 
iati ing List Used in 
Means and Standard Deviations of Word Rating List Us 
А Determination of Concurrent Validity for Various 
High Schools Samples (after McDonald) 


ipti Females 

Description p Males DE N Y 8р 

Overachievers 3 
Sample I 76 35.00 9.30 M 8 к 
Sample II 72 36.36 9.78 : 

Underachievers 8.23 
Sample I 61 295.90 9.74 i pp da 
Sample II 55 23.64 9.91 . 

Achievers 7.84 
Sample I 50 29796 09.83 т ү $72 
Sample II 50 31.86 9.37 

General Group 31.18 991 
Sample I 123 29.46 de A 28.20 8.81 


Sample II 115 29.46 .28 
Semple шб 2946 ОЗЕ 


78 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 4 


Concurrent Validity of Word Rating List As Determined by Zero-Order 
Correlations With GPA for Various High 


School Samples (after McDonald) 
Description Males Females 
N r N r 

Overachievers 

Sample I 76 .46** 83 .44** 

Sample II 72 .62** 81 .50** 
Underachievers 

Sample I 61 gre 73 21* 

Sample II 55 AT 63 .33** 
Achievers 

Sample I 50 .57** 50 .98* 

Sample II 50 .50** 50 .35* 
General Group 

Sample I 123 .48** 119 .49** 

Sample II 115 E? 124 .41** 


e Saibennlydifret ftom roe tt 
significant difference in WRL variances between males and females 
is found. This is not considered crucial, as no direct sex comparisons 
are to be made. Significant differences across samples are also noted 
in WRL variances and GPA means for females. It is felt that neither 
of these differences is of consequence because of the basic purpose 
of the study. This is particularly true of the mean GPA’s, because 
homogeneity of variance is stil] present. 

Table 6 contains intercorrelations among aptitude, achievement, 
and WRL measures. All correlations are significantly different from 
zero (p < .01), and homogeneity across samples is noted. 


TABLE 5 


Means and Standard Deviations of Variables Used i m 
T: 2 sed in Determining 
Predictive Efficiency of Word. Rating List 


Sample ПІ Sample IV 

Males (N = 254) (N = 261) 
0 x S.D. X S.D. 
DAT-VR a 9.69 30.16 9.09 
nee OL 8.59 19.40 7.43 
2.94 72 2.95 .59 

id (N = 117) (N = 120) 
DAT-VR eA 5.78 29.20 8.22 
GPA : 7.76 19.29 8.06 
3.27 66 2.22 .56 


M Заоа 2.22 — о D 


| 
| 
| 
| 


DAVID A. PAYNE 779 


TABLE 6 


Intercorrelations Among Variables Used In Determining 
Predictive Validity of Word Rating List 


DAT-VR WRL GPA 

DAT-VR 
Sample III “Жз .34 .00 
Sample IV 4x. .52 65 


WRL 
Sample III .42 ^A .42 
Sample IV .43 P .42 


GPA 
Sample III .62 .51 
Sample IV .71 ‚51 
1 All correlations significantly different from zero at .01 


* Values above diagonal are female (Sample III, N = 261; Sample IV, N = 120); below di- 
agonal male (Sample III, N = 254; Sample IV, N = 117). 


The multiple correlations of Table 7, when compared with the 
aptitude-GPA correlations of Table 6 indicate a significant increase 
in accounted for criterion variance. Whether or not this “inerease” is 
of social significance cannot be answered at the present time. The 
failure of male sample IV to show a significant increase can be ex- 
plained by reference to the beta weights of Table 7, and the inter- 
correlations of Table 6. Virtually all of the criterion variance has 
been accounted for by the DAT-VR. Impressive evidence for the 
efficiency and stability of the predictive equations can also be found 
in Table 7. Correlations of actual and predicted GPA ME equal ? 
(males) or exceed (females) the multiple correlations. 8 
concluded that a set of beta weights has been derived which e 
generalizable over individuals. Future research will investigate 

TABLE 7 
Multiple Correlations and Corresponding Beta Weig nes 
ing List (WRL) and Differenita 
m ea ct DATI) Suse 


Bela Weights 


1 for Prediction. 
1 Aptitude 


Males Multiple R 
Sample III (N = 254) ro A961 (DAT-VE) ЕІ ve i) 
Sample IV (N = 117) 72 .6650 (DAT VI Pe WRL 

Рета) Correlation of Actual with Predicted С: E | 
ema, 

Sample Ш (N = 261) .67* 5168 СЕЛ Г pA uL) 
Sample IV (N = 120) .70* pa Pear ete 


Correlation of Actual with Predic 


1 All Beta weights significantly different from zero at .05 level of sig 


А Sample ГУ. 
? Using Beta weighta derived from Sample Ш on Sample f significance. 
= Different from МАНЫЗ correlation (See Table 6) at < .01 level o! 


780 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the generalizability of these weights over situations and time. 

All of the presented data for the described high school samples in- 
dicate a high degree of validity, both concurrent and predictive, for 
an objective, reliable and theory-based measure of academic self- 
concept. 


REFERENCES 


Bendig, A. W. “Predictive and Postdictive Validity of Need Achieve- 
ot нег Journal of Educational Research, LII (1958), 

Brookover, W. B. “A Social Psychological Conception of Classroom 
Learning.” School and Society, VIII (1959), 84-87. 

Chahbazi, P. “Use of Projective Tests in Predicting College 
Achievement.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 

XX (1960), 839-842. 

Grooms, В. В. and Endler, N. S. “The Effect of Anxiety on Academie 

Achievement." Journal of Educational Psychology, LI (1960), 


Guba, E. G. and Getzels, J. W. *Personality and Teacher Effective- 
ess: A Problem in Theoretical Research." Journal of Educa- 
IM Psychology XLVI (1955), 330-334. 
mier, L. J. Cross-Validation Groups, Extreme Groups, and the 
рош of Academic Achievement." Journal of Educational 
Lo pU) LII (1961), 195-198. 
S ee r The Effect of Need for Achievement of Learning and 
Sred of Performance.” Journal of Psychology, XX XIII (1952), 


McNemar, Q Psycholo 9 аб Де k 
b gical Statistic, ky 
E: John W iy & Sons "T0585. s (Second Edition). New Yor 
e, D. A. “A Dimension of Analysis of the Academic Self-Con- 
D» of eros Grade Under- and Overachieving Students." 
Payne D y о Thesis, Michigan State University, 1961. 
tive Me ae an f arquhar, W. W. “The Dimensions of an Objec- 
P. sure of Academic Self-Concept.” Journal of Educational 
shay M C press, 
if Aw “Need Achievement Scales as Predictors of Academie 
Success.” Journal of Educational Psychology, LII. (1961), 282- 
ее Аз Stephens, M. W. "Relation of Achievement 
Ability." Jow ШШ Achievement in Students of Superior 
Җа rnal of Educational Psychology, LI (1960), 259- 


Relationship Between Grade- 
al Scores of the Guilford-Zimmerman 


UREMENT, XIX (1959), 673-074. 7 AND PSYCHOLOGICAL MEAS- 


| 
| 


— D)» eS у, „бы. э .ی‎ < майыны 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


THE VALIDITY OF A BATTERY OF CREATIVITY TESTS 
IN A HIGH SCHOOL SAMPLE* 


VICTOR B. CLINE, JAMES M. RICHARDS, JR., Aw» CLIFFORD ABE 
University of Utah 


Problem. This study was conducted to evaluate the degree to 
which academic performance in high school can be predicted from 
a battery of “creativity” tests. A second purpose was to determine 
the degree to which the criterion variance accounted for by the 
creativity tests is independent of the criterion variance accounted 
for by an IQ test. 

Sample. The sample consisted of 161 students (95 males and 66 
females) in a suburban Salt Lake City, Utah high school. These stu- 
dents were selected on the basis of having taken at least two high 
school science courses, and were in their senior year when the bat- 
tery of creativity tests was administered. In the data analysis, the 
two sexes were treated separately. 

Predictor Variables. For each student in the sample an IQ based 
on the California Mental Maturity Inventory was obtained from 
the permanent school records. In addition to IQ, scores were ob- 
tained for each subject on a group of tests selected ad пон d 
veloped by Guilford (1951), which are presumed to be highly re- 
lated to originality. These tests, however, did not t 
representative sample of all of Guilford's creativity E U 
rather are heavily weighted on “Tdeational Fluency. d 3 d 
of creativity tests, and the order of administration, was ү à jn 
Consequences, Word. Association, Hidden Figures, Brick Uses, an 
Match Problems. 

On the Consequences test, subjects were 


DENT iversi Utah and 
"This research supported by a contract between the University of 
the U, 8. Office of Education. 


required to state what 


781 


782 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


would happen if certain hypothetical events occurred. An example of 
the type involved is: “What would happen if all the iron ore in the 
world disappeared?” Two scores are obtained. The first is the num- 
ber of immediate consequences which is a measure of “Ideational 
Fluency.” The second is the number of responses which imply that 
the subject was thinking in terms of more remote or indirect conse- 
quences. In answer to the above question, the response “Unemploy- 
ment would increase” would be counted in the latter category. This 
score is interpreted as measuring “Originality.” 

On the Word Association test, subjects are required to give as 
many synonyms as possible for each of six words. The score is the 
total number of acceptable synonyms, a measure of “Associational 
Fluency.” 

On the Brick Uses test, subjects are required to list as many differ- 
ent uses for a brick as they can. Two scores are obtained. The first 
is the total number of uses listed (a measure of “Ideational Flu- 
eney”). The second is the number of times a change is made in the 
type of use. A change from “To build a house” to “To throw at the 
umpire at a baseball game” would be an example. This score is а 
measure of "Spontaneous Flexibility." 

On the Hidden Figures test, a subject is given a series of simple 
ue paired with complex patterns. The task for the subject is to 
Bore ded сола accompanying complex pattern; The 
Eng: of correct responses, a measure of “Figural 
und eol iM test, subjects were given a series of draw- 
tak lito Roue es arranged to form a group of triangles. The 

Я а given number of matches in such a way that a 
given number of triangles is left. The score is the total number of 
yo responses, а measure of “Adaptive Flexibility.” 
on reu on age was computed for each student 
This GPA included all du NM o Dco and 2-098 


À : sses in the student's last three years of 
high school which were given credit, towards graduation. 


x Enos pus 1, the mean and standard deviation of the cri- 
ris К "e means, standard deviation, intercorrelations of the 
predictor variables are shown for both sexes. In Table 1, correlations 


for males are shown below the di 
par lagonal and for females above the 


CLINE, RICHARDS AND ABE 


3 3928 
or e TON 


а „= 
Б 


"тесо2тур очу oAoqw рэушәвоза azw (99 = ДГ) so[wuo 10] euomv[oixoo pus [SUOSEIP әчү Aopoq рәүшәвәлФ oxw (сб = N) SPU 10у потувәллогу :930N. 


. 5 


$9'60 — LM 6g" 
898 83" ж, ea" 
0L or 8T 
26°61 08" 9r se" 
ere Ic Sr Ic 


pz 


Or 
P^ 
9c 
sU 
er 


Lg° 


Ly" 
Sc 
ee 
1g 


£r 


DI `6 

sureqoiq PN `8 
эәЗчечгу—вәв[ү HE '/ 
TPL—N pug °9 


uonerossy PIOM ‘& 
suopnipoaq 


93uxoAy qutoq opsir) "ү 


101491147) 
Rc IET ш dg E Se si 


әешәд 


49°86 ZEST 
98`9 xa 
959 LLY 
6£'0z sso 
gus else 
FL £98 
с0`05 gzs 
бл PUG 
66°% 68 
X. EJ 
apu AW 


6c 185 
SU 9er 
pa vv 
— es 
st at 
Yo" 12 
n © 
ez 80° 
YI Sc 
9 e 


saxog jog 10f SPUD 4 PUD suoivjoii0049]uT 


'suonpiaoq pippuvig “тирә yy LNPA PUD UONDIIT рг0риозѕу PUD шоуу uoti254;) 


I ЯПЧУ1, 


784 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Through the use of a program developed for the Burroughs Data- 
tron computer, Beta weights for each predictor for each sex and 
multiple correlations were computed. Two separate analyses were 
made for each sex, one of the creativity battery alone and one of 
the creativity battery plus the IQ test. Results are summarized in 
Table 2. 


TABLE 2 
Beta Weights and Multiple Correlations for Both Sexes 


Males (N = 99) Females (N = 66) 
With IQ Without IQ ^ With IQ Without IQ 

Word Association .0778 .1701 .2548 .3426 

Hidden Figures .1468 .2221 — .0082 .0486 

Consequences—Immediate .0244 .0894 .2164 .2323 

Consequences—Remote .1933 .2434 .2128 .2424 

Brick Uses—Total .1351 .1368 —.1460 —.1278 

Brick Uses—Change — .0892 — .0568 .0368 .0977 

iss Problems .2359 .2748 .1955 .2426 
Q .3220 — .3397 — 
Multiple r 69 65 68 63 


These results indicate that the creativity tests in this battery do 
have considerable validity as predictors of academic performance, 
and that the criterion variance accounted for by the creativity tests 
is to a substantial degree independent of the variance accounted for 
by the IQ test, Further evidence of the validity of the creativity tests 
Fide by the fact that for each sex the multiple correlation for 
s crea tivity battery alone is nearly as high as the correlation for 

e creativity battery plus IQ, and certainly higher than the first order 
validities for the IQ test. There is also evidence in the beta weights 
in Table 2 of potentially important sex differences in the intellectual 
abilities involved in successful performance in high school. 

Finally, the results of this study tend to confirm the importance 
of Guilford’s work, and to indicate that, contrary to what has been 
contended by some critics, the tests developed by Guilford are 
measuring intellectual characteristics not fully represented in the LQ. 


REFERENCES 


Guilford, J. P. “A Factor Analytic Study of Creative Thinking: 1- 


Hypotheses and Descripti 
ption of Tests.” Report of the Psychologi- 
a uberem Los Angeles: University at наф California, 


АНН. о. ا‎ 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREME: 
Vor. XXII, No. 4, 1962 pE 


THE INDIVIDUAL HIGH SCHOOL AS A PREDICTOR OF 
COLLEGE ACADEMIC PERFORMANCE 


REGINALD L. JONES ax» LAURENCE SIEGEL 
Miami University 


Tue usual studies forecasting student performance in college 
correlate a number of predictors (usually test scores and indices of 
high school performance) with some criterion (usually first year 
college grades). Multiple correlation techniques are then used to 
combine the predictors to obtain optimal prediction of the criterion. 
Sometimes an attempt is made to develop separate batteries for 
various divisions within a school; at other times predictions by 
division are made from a common battery of predictors (Eels, 1961). 
Most often predictions are made across university divisions—with 
à single test battery. 

A rather infrequent technique has invo 
diction equations for individual high school 
of predictors (Burnham, 1959). The purpose of this study was to 
test the anticipated results of this procedure: i.e., we hypothesized 
that predictions of college grades would be significantly improved 
When separate predictor-criterion correlations were developed for 


each originating high school. 


lved development of pre- 
s using a common battery 


Procedure 

The subjects were 1,031 students enrolling in Miami University 
from 15 high schools. The samples, ranging in N from 53 to 93, con- 
sisted of students enrolled in Miami during the period between Күр 
and 1955 who had completed the first two consecutive кешен М 
College work, No distinctions by sex ог university divisional enroll- 
ments were made. 

The predictors were ACE Composite scores 


785 


(Educational Testing 


786 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Service, 1948) and scores on The American Council of Education's 
Cooperative Mathematics Test (Educational Testing Service, 1939) 
and Cooperative English Test (Educational Testing Service, 1940). 

The criterion was first year grade point average. In all instances, 
this grade average was based on the completion of 30-38 semester 
hours of college work completed under a variety of college instruc- 
tors in a variety of courses. 

The range of student performance on predictors and criterion is 
reflected in the means and sigmas reported in Table 1. 

It should be pointed out that these data in no way reflect upon 
the quality of the individual high schools, since it is possible that 


students from a given school may not be representative of the pop- , 


ulation of that school. The data can only represent those students 
from a given school who elect enrollment in Miami University. Stu- 
dents high on predictor and criterion measures are found in all 
schools. On the other hand, there were noticeable differences in aver- 


TABLE 1 
Means and Standard Deviations of Predictors and Criterion 
Predictors Criterion 
High 
i Coop. Coop. School First Yr. 
ACE Eng. Mathi Rank! Grades 


School N M SD. M SD. M SD. м SD. м 8р. 


A 55 1113 208 1513 40.6 56.6 
3 1513 40.6 56.6 9.8 48.5 22.6 2.36 .68 
© 82 ies ui 1642 822 524 7.1 753 10.7 1.98 .75 
D 93 1004 a 1485 35.7 542 11.8 53.9 304 1.92 .95 
E 81 1139 125 114 358 542 117 57.5 28.1 2.02 8l 
F 78 1052 188 1543 312 556 95 бїт 248 205 1 
G Тр 1052 233 1522 387 554 08 4&7 216 188 .86 
H 65 107.0 oy 160.1 42.7 56.9 10.6 43.9 24.6 2.21 .72 
I 57 1000 216 145 318 039 83 464 250 204 80 
J 59 1081 127 1084 427 603 87 392 229 2.73 6l 
K 53 955 210 1061 324 543 116 50.5 23.5 1.88 77 
L 88 1042 110 1565 274 545 06 506 210 1.92 79 
M 83 105.2 254 147.6 357 548 90 41.1 23.1 1.87 .91 
N 1 1100 124 1820 433 553 94 413 246 2.25 94 
o 55 1095 172 1910 310 564 7.5 583 206 225 .72 
5 197 1524 284 525 97 432 246 210 .75 
Com- 
posite 1031 1081 26.1 1551 40.0 55.1 96 бїз 35.1 209 81 
1 Standard scores, 


24.00 = A, 3,00 = B, eto, 


n i е 


2 -- و‎ Ина а 


k 


JONES AND SIEGEL 787 


age scores on predictors and criterion for individual high schools. 

Multiple correlations (Wherry-Doolittle Test Selection Tech- 
nique) were computed for the predictor-criterion correlation ma- 
trices for the individual high schools and for the composite of sub- 
jects across high schools. Differences between the R obtained from 
the composite sample and R's based on data from individual high 
schools were tested for statistical significance. 


Results and Discussion 


Multiple correlations for individual high schools are presented in 
Table 2. Also included in this table are data on the percentage of 
variance accounted for by each predictor. 

It is obvious that the predictors are not uniformly predictive of 
performance across high schools. In only 5 of the 15 high schools did 
all predictors make a unique contribution to prediction of the cri- 
terion variance. Three predictors were needed for optimal prediction 
in 4 of the high schools and two predictors were needed for optimal 
prediction of the criterion for 5 other schools. In one instance, only 
one variable was needed for optimal prediction of first year grades. 


TABLE 2 


Multiple Correlations Between Optimal Combinations of Ре and 
First Year Cumulative Grade Point Average at Miams 


Percentage of Variance Accounted 
i Total 

For By Each Predictor Pal MOD 

High НӘ. Cl Total Соор. Соор. of Variante Corre- 

School Wauk ^ ACE Math English, Predicted — lation 


A 25.93% 6.13% 13.29% 0.06% 45.41% b 
B 55.06 zc ee МЕ 55.08 T 
c 23.98 13.03 7.26 = 4 s 
D 26.75 13.88 11.09 — s in 
2 2426 19.35 548 5.18 тл u 
6.7 20.27 = = А 

G 1810 19.17 6.77 15.26 9:80 T 
г 1M a ТТ ОН y 54.73 4 
I 22.90 9.47 7.60 15.46 un d 
J 12.07 17.55 = — n Te 
K 22.40 7.46 = — ps Te 
L 17.25 22.76 — — SNC Ar 
M 10.74 9.94 10.50 — yes bs 
N 17.08 I. 25.46 5-96 sx m 
о 20:25 12.90 2.69 1.10 : 

2 32.26 57 


All Schools 9.48 13.08 6.15 .65 


788 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


When the 1,031 subjects were combined without consideration for 
originating high school, all variables contributed to the prediction 
of criterion variance. Note, however, that in no instance was the R 
based on the composite population a significant improvement over 
the R resulting from predictions based on the single school. Con- 
versely, predictions made for students from individual schools did 
show significant improvements over the R for the composite sample 
in 9 of 15 instances (see Table 3). 


TABLE 3 
Significance Tests of Differences Between M: ultiple Correlations 


Multiple Correlation 
Individual 
High School ^ Composite High School t ie 
A 57 .67 2.33 < .0100 
B 57 .74 5.03 < .0001 
(6 57 .66 2.20 <.0100 
р 57 .72 4.81 < .0001 
Е 57 ‚74 5.59 < .0001 
F 57 51 —0.98 N.S, 
G 57 72 4.33 < .0001 
H 57 :67 2.30 < .0100 
1 57 74 4.79 <.0001 
x 57 .54 —0.47 N.S. 
£ 57 .55 —0.31 N.S. 
М 57 .63 1.36 N.S. 
N 57 .56 —0.19 N.S. 
a .57 70 2.96 < .0010 
Composite 57 a = -— 


1 1 H 
One-tailed test; p S .05 adopted as a critical value. 


h Rabies of the study are interpreted as supporting the initial 
ypothesis that significant improvements in predicting college 


grades are made when data from iginati i ] are 
treated separately. each originating high school a 


REFERENCES 


kr es eu Division, Educational Testing Service. Coopera- 

ойы эз БА: Tests: Suri rvey Test in Mathematics. 

i A S i perative Test Division, Educational Testing 
ooperative Test Division, Educati i i irecti 

N ! onal Testing Service. Directions 

for Using the Cooperative Tests: Cooperative English Test. New 


red Educational Testing Service, Cooperative Test Division, 


——————— o €)mm— REED" 


JONES AND SIEGEL 789 


Cooperative Test Division, Educational "Testing Service. Manual of 
Instructions. American "Council on Education Psychological Ex- 
amination for College Freshmen. New York: Cooperative Test 
Division, Educational Testing Service, 1948. 

Burnham, Paul 8. “The Assessment of Admissions Criteria.” The 
Association of College Admission Counselors Journal, IV (1959), 


Ed, Kenneth, “How Effective is Differential Prediction in Three 
^ Types of College Curricula?” EDUCATIONAL AND PsYCHOLOGICAL 
| MEASUREMENT, XXI (1961), 459—471. 


EDUCATIONAL AND PSYCHOLOGICAL 
Vou. XXII, No. 4, 1962 Мни 


THE CONCURRENT AND CONGRUENT VALIDITIES 


OF THE WIDE RANGE ACHIEVEMENT TEST 
f 
| 


KENNETH D. HOPKINS, JAMES C. DOBSON 
University of Southern California 


AND 
O. A. OLDRIDGE 
University of British Columbia 


Background. 'The Wide Range Achievement Test (WRAT) was 
originally published in 1946. Because of inadequate data regarding 
the test’s validity, its questionable norming procedures, and the un- 
substantiated claims of clinical “signs” offered by its authors, the 


test received generally unfavorable reviews. Since that time the 


test seems to have been ignored by testing authorities because of 


these technical inadequacies. None of the standard texts on testing, 


such as those by Cronbach, Anastasi, or Thorndike, even mention 
have gained widespread use; 


frequently used achievement 
1961). It is widely used as à 
d in reading clinics. Why 


the test. The test, however, appears to 
the WRAT is listed as the second most 
test in psychological clinics (Sundberg, 
test of reading by school psychologists ап 
has the test gained this acceptance by practitioners in light of the 
inadequate information available on validity and its other technical 


deficiencies? The reasons seem to lie in the test's applicability to а 
to superior adult), its ease of ad- 


Wide ability span (kindergarten 8 
res less than five minutes 


ministration (the reading test usually requi | Gi 
Per examinee), and its virtual elimination of chance variance (rec 


format). Its users also appear to have an “impressionistic” convic- 
tion that the results agree closely with relevant criteria. The present 


Writers were, however, unable to locate à sin 


taining to the WRAT. 
Problem, The purpose of this study was to evaluate the reading 


test of the WRAT in terms of its concurren 
791 


gle empirical study per- 


t and congruent validity. 


792 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(A comparison of the WRAT with another widely used, longer, 
more adequately standardized reading test in terms of concurrent 
validity was proposed.) 

Procedures. The concurrent validity of the WRAT was assessed 
by correlating WRAT results with independent teacher rankings of 
reading ability. It was felt that teacher rank would be a superior 


criterion to school marks because of the increase in precision, elimi- - 


nating to some extent the contamination of variance in the grading 
philosophies of the teachers. The ranking was done in the presence of 
a school psychologist or psychometrist without any prior announce- 
ment in order to obviate the possibility of criterion contamination 
that would have been introduced if opportunity to consult previous 
testing records was provided. These ranks were than transformed 
to normalized standard scores using the table provided by Walker 
and Lev (1958). Within the week the WRAT was administered to 
all experimental classes by the school psychologist, psychometrist, 
or remedial reading specialist. A Pearson correlational analysis was 
performed through use of an IBM 7090 computer. 

The sample consisted of all pupils in grades 1-5 in schools where 
means have closely paralleled the national means in achievement 
and intelligence on previous standardized test results. There was à 
sample of 502 students: 90, 106, 171, 49, and 86 in grades 1-5, 
respectively, 

Results. The Pearson correlati 
rent validity of the WRAT tog 
are given in Table 1, 

The California Readin 
(RC) Tests (CRT) of 
ment Tests had been gi 


on coefficients showing the concur” 
ether with grade placement means 


9 Vocabulary (RV) and Comprehension 
the 1957 edition of the California Achieve- 
ven one week previous to the WRAT in some 


TABLE 1 
Means and Coefficients of Concurrent Validity for the WRAT by Grade 
Grade Шр © 
Mean—WHATS DNE aues "T а 1% 


* WRAT was given during : 
the 1st month. the 6th school month except in grade 5 where it was given during 


* Mean CTMM Total IQ's were 107.1, 104.0, and 101.5 in &rades 3- 


5 respectively. 


HOPKINS, DOBSON AND OLDRIDGE 793 


TABLE 2 
Congruent and Concurrent Validity Coefficients for the WRAT and the CRT 


California Reading Test 
— Tahe 
RV RC Total Rank t 
Grade 3: 
WRAT .825 .836 ‚8683 .906 
.480 
CRT—Total ‚879 
Grade 5: 
WRAT .673 .668 ‚712 .851 
.253 
CRT—Total :875 


of the schools in grades 3 and 5 (N’s of 90 and 86, respectively) ; 
therefore, it was possible to assess the congruent validity of the 
WRAT at these grades and also to compare their relative concur- 
rent validities, Table 2 gives these validity coefficients. A t-test for 
correlated correlations given by Walker and Lev showed the relative 
concurrent validities were not significantly different. 

Discussion and Conclusions. Although the WRAT correlated 
highly with teacher rankings at all grade levels investigated, the 
coefficients in grades 3, 4, and 5 were higher than those in grades 
1 and 2. It is the writers’ opinion that this results, in part, from in- 
creased criterion reliability resulting from increased reading vari- 
ance by grade. 

It is interesting to note that the WRAT, which has no comprehen- 


sion section, did correlate as highly with the RC section of the CRT 


n : the 
as it did wit! tion. The common variance between 
it did with the RV sectio ade 3; however, 


CRT and the WRAT was less at grade 5 than at gr 

common variance between the tests and the criterion for ade 
validity changed little. Both the WRAT and the ORT possessed su 
stantial concurrent validity; neither being superior. 


REFERENCES La neun 
Sundberg, N D. “The Practice of Psychological Les ist 
Clinical Services in the United States.” ‘American Psychologist, 


XVI (1960), 79-83. tistical Methods 
Walker, Helen M. and Lev, Joseph. Elementary Sta 1958. 
(Revised Edition) New York: Henry Holt and Company, 


SE 


a 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vou. XXII, No. 4, 1962 


PREDICTING GRADE POINT AVERAGE 
AT A SMALL SOUTHERN COLLEGE! 


MARY CATHARINE: VICK 


AND 
JOHN A. HORNADAY 
Greensboro College 


The Problem: The purpose of this study was to determine the 
predictive validity of a battery of three standardized college en- 
trance tests and high school rating against a criterion of freshman 
grade point average at Greensboro College, as an example of a lib- 
eral arts, church-related college. 

Predictive Variables: The predictive variables used were: The 
Scholastic Aptitude Test (SAT) of the College Entrance Examina- 
tion Board, Verbal and Mathematics scores; the Cooperative School 
and College Ability Test, Form 14 (SCAT), Verbal, Quantitative, 
and Total converted scores; the Cooperative English Test, Form 1A 
(CET), Vocabulary, Level, Speed, Total Reading, Expression, and 
Total English scores; the high school rank converted to a T-score 
according to the method recommended by Duggan and Hazlett 
(1961) ; and high school grade point average. Thus, there were thir- 


teen separate predictors considered. 

Criterion Measure: The criterion measure was the student’s grade 
point average at the end of the first year of college work. A 
School and college GPA’s were computed on the basis of A = 7 
В 8, C= 2, р =1, Е = 0 honor points. Physical education - 
other non-academic courses Were not considered in computing GPA. 


=r isted in the per- 
1 Appreciation is expressed to Mark W. Kennedy, m tion ова Tas 
ormance of desk calculations in this study. Romer bue at D 
computed through the facilities of the IBM 7070 computer a 
urham, North Carolina. 


uke University, 


795 


796 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


"euonvperroo ur POU PUPI e 


7 8 8 8 2 6 OT ZI ZI Z 18 or. :8 £L uonstaeq paepuvjs 
T9T 61 9T O91 ет бт F6E 665 06 cb 97 907 85 255 usa 

ЧвцЗаз TOoL/L3O ‘FI 
816 Burpee TyyoL/ LAO "£T 
668 069 чотевәгіхя/ LAO "cr 
в 016 @9 peeds/. LAD "II 
289 FFL Ос 992 PAPT/ LAO “Or 
158 816 99 3/9 (бә AIBINGVIOA/ LAD *6 
59 419 бр 089 416» т LALVOS `8 
SEE Gre 98% — 00c 195 Fe FI O/LVOS `Z 
969 692 +h 9 15 ZL 398 GF A/LVOS `9 
TOS 099 8& 6L OF 605 29 б тс W/LVS '9 
0228 708 6 %04 19 69.  Sc9 0/2 269 125 A/LVS °F 
ТР ste 08Р 008 982 15 РЕ ee 9I sie ФОР Vdd 100498 цан `g 
ogy 798 ОР 18а Tī% 0928 +65 ө / с Тт 828 yuva [00Чов үзг ° 
999 OOF — 169 FOF — 996 99р 0. Crer 602 9/2 —gop ор g9 Vdd 2391199 T 
FI £I ст тї от 6 8 n 9 9 Ӯ 8 [4 I ENLA 


4814002001 120144], 40] sjuo12f207) мозюәлшог)у үиәшорү-әпрол,] UOSID T 


I WISVIL 


*peurmjqo әләм £’ Jo uonw[o1100 uompea-esoro т рив gg Jo Y [euro из 'LVOS риз м SH HIM “LVS PUY Y SH Susy $ 
"6061 “%®зоң оолу peadepy $ 
“pay, SHOUT әгзүоләйоогу әчү jo eurroj enotzeA poen eorpnas oco] $ 
"LL 0} ZF шогу SSM sdnoi3 euru-A3u943 әчу 10j Suorje[oLroo ejdrnur 
jo өйт: oq ‘sarema; Jo edno:3 uoojznoj pue so[eur jo edno13 uoojjg 10у poureqo вчорузүәллоо jo su'rpoul orw sonal 'ednor3 GF ut COIT OF GE 01у ATGA BN жж 
"eopsuroy jo ёЧполЗ OM} рое so[wur jo ednorz омӯ 103 paure3qo euon'eporroo jo Suxrpour AIB SAMSLJ 'ednoi3 mo} ur CZT OF 16 шол AIBA SN + 


“T961 ‘Sa "(s pue v) "9 
“ISI ‘xef pus uoszo|d “ZI conser ‘2961 'zourwung jo 566 eswd poys 
"£€6I ‘TOW PUY qq9 AA "IT -qnd gyei it urr0juo9 оў pej2elros soinZiqp "[96I 'erpreg pus uosuwg `6 
78061 ‘SPH рив 'srAwq “тилд "(я роз y) “OT (а) "E961 ‘мәт "(q pue v) Ф 
`6бебт 'Furpmedg *6 "G90I rug pur “түеур ‘сәлоон ‘поцетәф) ‘XON ‘вәпор '[owgor "(9 puw v) ‘E 
*6eor 'Aoperg рив qgnpy ‘(9 ров y) "8 (v) "z961 'etavT "а 
ie "төвт ‘uue "2 "Apmas упөвәлд "T 
а I TER o or ت کی‎ 18 E epe ac ыр ee m еб £9 (°рүА-вволә) 27 
E 99 64 99 09 eg [2A 19 #38 eg 8° 28 с9 GF 19 FF 0c u (120190) y 
1g FG 29 OF OF 99 98 OL 16% S 66 2% сє es OF = £9 VdDO—sH 
8 с gg ж E 1g 19 FE == = = 2g vapo—}*-. Lao 
== = a = == 66 98 == — — — 98 TE XS = oF lb WdD0— LVOS 
a ЗВ кас Же ME queer HI бр - Spei m em my YdD0—"LVOS 
= E ee um ae emis, 88 Ў? == og CS ==» = 5 9£ V¥dD0—.LYOS 
ы = 19 Sy oF 08 к T £9 = эте 0€ > a 06 $6 = % YdD0—"LYS 
О a sy SF GF 8с л БӘС о mi 1# x = TF ec = oF VdDO— LYS 
> qa сна "апа nd сид сид “Bd сид 'qnd 'qnd ng 'qnd 'qnd “Hd “Hd "qnd та aaug/ MMA 
N ЖИ d N Я i И я NWN N N Я x Я N GW d xag 
9 FST жж жж 80€ * * oor бос; єс 029 OFR SEIT geez 60% $660 FM 89899 JO "ON. 
ra IT чот VOI 6 as vs L ag v9 9 Яғ VP 98 VE © 1 


(paruo speuroo(q) 
0112010) paS burs) spn fippipo A fo =т= 


с TTYL 


SF aM р. - E — 
e ——————— S" ——————— M MÀ — BÓ 


7088 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Sample: Subjects were 164 women entering college for the first 
time in the fall of 1960; this number represents the entire number of 
women entering the college that year for whom test scores were 
available. A separate group of 100 women, selected randomly from 
the freshman class of 1961, was used as a cross-validation group. 

Results and. Discussion: Intercorrelations, means, and standard 
deviations are reported in Table 1. The Doolittle method was used 
for calculating R’s for high school rank and CET/Total both with | 
and without SAT and SCAT. The R’s were increased less than .02 
by the scholastic aptitude tests because of the high correlations 
with CET. Thus the multiple prediction formula was derived on the 
basis of two predictors and became: 


GPA = .0462X, + .0857X» — 6.1490 


where X, = high school converted rank score and Х = CET/Total. 
Cross-Validation Data: A new group of 100 randomly selected 
females who entered college for the first time in the fall of 1961 was | 
used as a cross-validation group. For this group the predicted GPA 
was correlated with the GPA actually obtained during their fresh- 
man year, The correlation dropped somewhat below the original R, 
becoming .63. This drop reflects increased use of the predictive vari- 
ables by the Admissions Committee in its selection program. 
Twelve validity studies are reported in Table 2 for comparison — 
Purposes. The table includes all such studies published in Educa- — 
tional and Psychological Measurement from 1951 through Summer, 
1962, using these predictors. The results are similar to those ob- 
Ku here, with the exception of the relative emphasis placed upon 
the Cooperative English Test Score in the present study. It is of 
interest to note that generally higher correlations are obtained for 
women than for men and for private than for public institutions. 


REFERENCES 


Duggan, John М. and Hazlett, Paul H., Jr. Predicting College 


[ 

[ 

Balls esae Хойс College Entrance Examination Board, 1961. 
¢ 


je College Effective Is Differential Prediction in Three 


icula?” z dkt 
PASUREMENT, XXI (1961), Та AND PsYCHOLOGI 


Examiner's M. ] 3 
Second Supplement ee School and College Ability Tests, 


neeton, New Jersey: Educational Testing 
: + Ivision, 1956. / 
» Junius A., and Garcia, Dolores. “Prediction 


* 


i 


VICK AND HORNADAY 799 


of Grades from Pre-admissions Indices in Georgia Tax-Supported 
Colleges.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 
XVIII (1958), 841-844. 

Guilford, J. P. Fundamental Statistics in Psychology and Education 
(Third Edition). New York: MeGraw-Hill Book Company, Inc., 
1956. 

Horst, P. A. Relationship between Preadmission Variables and Suc- 
cess in College. Seattle, Washington: University of Washington, 
Division of Counseling and Testing Services, 1959. 

Klugh, Henry E. and Bierley, Robert. “The School and College 
Ability Test and High School Grades as Predictors of College 
Achievement.” EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 
XIX (1959), 625-626. я 

Lewis, John W. “Comparing Zero-Order Correlation from SCAT 
Total and Multiple Correlation from SCAT Q and V at Southern 
Illinois University." EDUCATIONAL AND PSYCHOLOGICAL MEASURE- 
MENT, XXII (1962), 397-398. (a) [ I 

Lewis, John W. “Utilizing the Stepwise Multiple Regression Pro- 
cedure in Selecting Predictor Variables by Sex Group.” Epuca- 
TIONAL AND PSYCHOLOGICAL MEASUREMENT, XXII (1962), 401- 
404. (b) 4 nas 

Mann, Sister M. Jacinta. “The Prediction of Achievement in a Lib- 
eral Arts College.” EDUCATIONAL AND PSYCHOLOGICAL MEASURE- 
MENT, XXI (1961), 481—483. nna, Gershon, Arthur, 


Michael, William B., Jones, Robert A. Cox, Ann n, AT 
Hoover, МО Katz, Kenneth, and Smith, Dennis. “Hi 


i f Success 
School Record and College Board Scores as Predictors 0 3 
in a Liberal Arts Program During the Freshman Year of Col 
lege.” EDUCATIONAL AND PSYCHOLOGICAL "MEASUREMENT, 


(1962) , 399—400. : a ч 
Рїегзоп, Ub A. and Jex, Frank B. "Using the Cooperative ee 
eral Achievement Tests to Predict Success in Engineering. 


CATIONAL AND PSYCHOLOGICAL MEASUREMENT, XI (1951), 397- 


402. i 
Spaulding, Helen. “The Prediction of First-Year Grade 
а Private Junior College." EDUCATIONAL AND Psyc 


M NT, XIX (1959), 627-628. р TT P SA 
REE "Edward 'O. and Berdie, Ralph F. j Predictive үш, 1 
an Institute of Technology r108. Ае AND Psyc 
M XXLU ХӘ. 
Webb, Sam C. and McCall, John N. “Predictors of c inen 
їп a Sounthern University." EDUCATIONAL AND Psyc 
MEASUREMENT, XIII (1953), 660-663. 


Averages in 
'HOLOGICAL 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
VoL. XXII, No. 4, 1962 


BOOK REVIEWS 


Edited by 
WILLIAM B. MICHAEL 
University of California, Santa Barbara 


Getzels and Jackson's Creativity and Intelligence: Explora- 
tions with Gifted Students. RICHARD DE MILLE AND PHILIP 
R. MERRIFIELD еи 

Luce's Developments in Mathematical Psychology. RoBERT 
W. БАВИ ЕВ EE 

Margulies and Eigem's Applied Programmed Instruction. 
J. A. R. WILSON уок oo visive ОЧЕ 

Fiske and Maddi’s Functions of Varied Experience. JoHN 
DELAMATER AND ManqUISA DELAMATER «++ ens 

Dressel’s Evaluation in Higher Education. Dororay M. Cor- 
LETT... os coe sig «sav Y ваз DODAT ЕЕ ДА аа ааа 

pn Ms ve eS 2 RA REM КУШУ, 
ygotsky’s Thought and Language. , Re 

Batters pes and Guidance in Schools. WILLIAM 
COLEMAN ,.... ооо ннен ОА күү: 

Haring and Phillips! Educating Emotion 
dren. ARTHUR ПЕВМЕВ......., зеке er 

High's Teaching Secondary Schoo 


FrTOH е Ana tendo roO Sass ope ЛА 
Hansen's The Amidon Elementary School. Rov M. FITCH .... 


Hart's The Psychology of Insanity. ARTHUR LERNER ..----+- 


801 


803 
808 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Vor. XXII, No. 4, 1962 


Creativity and Intelligence: Explorations with Gifted Students by 


Jacob W. Getzels and Philip W. Jackson. New York: John 
Wiley & Sons, 1962. Pp. xvii + 293. $6.50 
Despite the enthusiasm of the publisher and the quick public ac- 
ceptance of this book, a rather discouraging evaluation of it must 
be made in a journal devoted to measurement problems. Its success 


` is quite understandable as a consequence of the readable style and 


wealth of anecdotal material. Even an informed and research- 
minded reader will enjoy a sixteen-year-old’s story that begins: 


- "Freddie Jones was a nice boy. If you don’t believe it just ask him,” 


and ends: “Freddie will never amount to much but he will be happy. 
How are you doing?” (p. 188). Naive readers, of whom there will 
be many, will take pleasure also in the measurement portions of the 
book. Informed readers are in for a disappointment. 

Of the various possible approaches to research problems, some 


. are incompatible with others. Attempts to combine the incompatibles 


ید 


sive attempt to solve it is tha 


run the risk of disaster. Two such dichotomies are: trait versus type 
psychology, and exploration versus hypothesis testing. In trait 
psychology, of which faetor-analytie studies offer familiar examples, 
attention is focused on traits that are assumed to be common to all 
members of a given population, in greater or lesser degree. A goal 
of research is to differentiate the traits, providing operational defi- 
nitions for them that are distinct or mutually exclusive. In type 
psychology attention is focused on categories of people. It is RA 
that persons in general fall naturally into classes distinguis я у 
unique combinations of attributes. The purpose of classifica on is 
to reflect the natural attribute clusters, which may then P S я 
as unitary entities, or types. Type psychology has i ound less 
fruitful than trait psychology in the study of personali y. | a 

Getzels and Jackson started out by concerning aa be Ms ч 
an important problem in the analysis of aptitude traits. pd e 
as the central issue of their study the reasonable assertion. de 
IQ measurement is an incomplete and in many one ш еа е 
measure of giftedness, and they followed the lead k Thurs id ү 
others in suggesting that the so-called creative abi үе ы i des 
What is missed when intellect is equated with IQ. This pro! 


ful and certainly the most exten- 
old one. Perhaps the most hat of Guilford, which has been develop- 


803 


84 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ing for nearly fifteen years. Getzels and Jackson were well aware 
of that and included tests adapted from or related to Guilford tests 
in their creativity battery. 

Having gone so far they unaccountably forsook trait psychology 
for a tortured exercise in the study of types. Instead of pursuing the 
relationships between traits of creativity and other intellectual and 
personality traits—knowledge about which would be valuable in- 
deed—they attempted to distinguish between creative and intelli- 
gent types of students and to discover clusters of attributes proper 
to each type. The tools they used to effect this classification were 
those very tests that had been developed as measures of traits com- 
mon to all, and the classification was achieved by discarding a 
sizable group of students who were high scorers on both measures. 
Such a procedure is not comparable to taking groups of extreme 
scorers, high and low, on the same measure, which is a conventional 
research procedure. It is rather more like studying two types of 
people, parents and married people, while leaving out the group 
composed of married parents. 

The students were sixth-through-twelfth-graders in a Chicago 
private school. Adequate data were obtained for 449 of a tested 
school population of 533. The mean IQ of the 449 students was 132, 
with a standard deviation of 15 IQ points. 

The intelligence measure involved a Stanford-Binet, Henmon- 
Nelson, or WISC IQ obtained from existing records. The creativity 
measure was a summated score from a somewhat redundant and 
not entirely appropriate collection of five tests representing a very 
limited array of factors of creative thinking. The descriptions of the 
test items as well as the test intercorrelations indicate that verbal 
meaning, the chief component of most IQ measures, was an im- 
Pier ad factor in every test. Recent observations reported by Guil- 

ia Mig that, at these high levels of ability, IQ is hardly cor- 
ue P _with creativity measures. Three of Getzels and Jackson's 
eine je bi AE opem correlated with their IQ measure 
one, especially for their RICCA ME ever en 
Bells A rs both the IQ and the composite creativity 
Py d centile the investigators made a two-way classi- 

is one hi Abit students, This resulted in the creation of four 

AMA, T i group who were in the upper 20% on both 

а н group who were in the lower 80% on both 

més dud he eid group who were in the upper 20% on the IQ 

high group who D OR = Клен; tdi 
0 

upper 20% on ths АЧ oe on the IQ measure and the 

n such studies, as Torrance has $e rted, the high-highs may be 
as numerous as the high-lows or the Tow-highs yero. of the in- 


BOOK REVIEWS 805 


complete figures presented by Getzels and Jackson indicates that 

eir high-high group could have been as numerous as the high-low 
and the low-high groups combined. Nevertheless, no report of the 
number of students in either the high-high group or the low-low 
group is given. This strange omission becomes even stranger when 
it is realized that if the high-highs were numerous, as they were re- 
ported to be in an earlier paper, then the creation of two types of 
students by the selection of small groups that did not overlap was 
arbitrary and essentially meaningless. It is not type psychology at 
all, in the best sense, and it entails а serious loss of information be- 


By confining their attention to the “highs,” the investigators 
denied themselves the use of about three-quarters of their subjects. 
The exclusion of the high-high group further restricted the study, 
so that ultimately only one-ninth of the subjects were used. The 
‘stated purpose of this drastic reduction was to isolate two types of 
cognitive excellence, but its effect was to manufacture two fictitious 
types of people. Both the loss of information and the unnatural 
Classification could have been avoided by a trait psychology ap- 
proach using a correlational analysis. This would have necessita: 
gathering non-aptitude data for more than the 52 students studied, 
Dut the results might have been well worth the labor. Getaels and 
Jackson dealt with some very interesting non-aptitude variables, 
both psychological and sociological, and a successful analysis of all 
P pores might have made a valuable contribution to 
sychology, sociology, and education. ШЕ. 
Тһе high-low cR was inappropriately labeled the “high in- 
- tellirence group," giving the impressio: 
IQ levels were included. Likewise, the low-high group was mis- 
named the “high creativity group,” giving the impression that all 
those at the highest creativity levels were included. A scant footnote 
(p. 21) to the effect that the high-highs were “of course” excluded 
does not prevent the reader from losing track of the high-high group 
entirely (if indeed he ever realized their existence) as the labels 
high IQ" and "high creative" are repented on page after page. 
An additional source of confusion against which the reader is not 
adequately protected is the fact that in comparison with ordinary 
school populations almost all of these students were "highs." The 
reader is forced to keep reminding himself that about half of what 
are implied to be low-IQ subjects actually had IQ scores 4 Hoo 
In spite of a warning— possibly an afterthought—that the т - 
. May not be representative or repeatable, & misleading impression о 
Beneralizability is created throughout the book. The informed reader 
Will find it difficult to resist. The naive reader is sure to find it over- 


Whelming. Ң ae 
A second general criticism involves the confounding of descriptive 


806 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


and predictive methodology. As the title reveals, the study was ex- 
ploratory. The authors speak of being “lost in phenomena without 
relevant explicit concepts to guide [their] needs" (p. ix). Assuredly 
there is nothing wrong with doing a purely descriptive study. Getzels 
and Jackson, however, attempted to improve on exploration by 
bringing in statistical tests at every opportunity, as though to test 
hypotheses, when no hypotheses had been stated. In many instances 
а significant finding could be reported only by dropping the accept- 
able level of significance to the 10 per cent or even the 20 per cent 
level. In the absence of previous predictions, such "significance" has 
no meaning. The procedure improperly imparts an appearance of 
generality to observations that may be largely or entirely unrepeat- 
able. The fact that this problem is acknowledged in a tardy footnote 
(p. 62) does not help matters, If the investigators knew the pro- 
cedure was inappropriate, they should have been even less ready to 
make use of it. The same results could have been presented with 
complete propriety and clarity using confidence intervals, as is com- 
monly done in descriptive studies where parameters are to be esti- 


Thus far criticism has been aimed at the general design of the 
study. Even within the investigators’ frame of reference, however, 
unjustified procedure were used and consequent erroneous findings 
were made. 

A discovery widely attributed to Getzels and Jackson is that 
teachers prefer “high IQ's" to “high creatives.” This may be true, 
but it was not indicated by Getzels and Jackson’s observations, even 
opt they do report something that sounds very much like it. 

eir exact wording is: "the high IQ students are preferred over the 
nae students by their teachers, the creativity students are not” 
kn AES statement, the import of which is not immediately 

i arises Irom a comparison of each of the subgroups in turn 
with & larger group of 449 (or 395— the table is ambiguous) “aver- 
age students.” It is reported that the teachers took a significantly 
greater degree of pleasure in teaching the “high IQ's" than in teach- 
m n average students." There is nothing surprising in that. 
an M 0 iie а greater degree of pleasure in teaching the “high 
Ы MOS n in teaching the "average students,” but the result 

as not statistically significant. The fact that both of these prefer- 
ences were in the same direction and of nearly the same magnitude 
is ignored. The reader is left with the unjustified impression that the 
teachers preferred the “high 1Q’s” to the “high creatives.” 

If this questionable logic is not enough, a closer inspection reveals 
that the reported significant difference in teacher reaction to “high 
IQ's and "average students" wag itself illusory. This "significant" 
finding was based on a t test requiring homogeneity of variance, 
while the table reveals that the variances were not homogeneous at 


13 


———— m -—————— | — 


BOOK REVIEWS 807 


all. The F ratio exceeds five-to-one. When the ¢ test is appropriately 
made with an estimate of population variance, the significance van- 
ishes. Analysis indieates, moreover, that a difference between the 
"high IQ's" and the “high creatives” as great as the one reported 
would occur through chance sampling at least one time in four. 

Contrary to their report, Getzels and Jackson's results suggest 
that teachers do not prefer any group of students to any other 
group. If there were actual differences in the teachers' preferences, 
they were overlooked by this method of analysis. A more sober and 
thorough appraisal would have resulted from using median dichot- 
omies, establishing four groups of students, and performing an 
analysis of variance of the teachers’ ratings. 

The matter would seem to end there, but one point remains. The 
group of "average students" included an unspecified but certainly 
substantial number of students who were in the upper 20 per cent on 
both measures. They were outstanding in both creativity and intelli- 
gence. The group included also a large number of students who 
were in the lower 80 per cent on both measures. This lumping to- 
gether of high-highs and low-lows entirely obscured the feelings of 
the teachers toward the high-highs. It is just possible that, given 
the chance, the teachers might have said their greatest pleasure was 
in teaching students who were excellent in every way. In their long 
promised study of the high-high group, Getzels and Jackson have 
an opportunity to test this hypothesis. 

Limitations of space forestall an exposition of every misstep, but 
one additional kind of error is worth noting. In a table (p. 39) pre- 
senting supposed differences in fantasy production between “high 
IQ's" and “high creatives,” six successive chi-square tests are shown 
on different scorings of the same fantasy responses. The usual prob- 
ability tables for chi square assume that the test will not be repeated 
On correlated aspects of the same observations. Were hypothesis- 
testing techniques justified at all here, almost any multivariate 
analytic technique would have yielded more interpretable results 
than this insupportable repetition of a univariate model. In the ab- 
Scence of hypotheses, perhaps the best presentation would have 
employed confidence intervals for proportions or for differences be- 
tween proportions. 

"Throughout the book the reader encounters questionable clinical 
interpretations, incongruous theoretical statements, and gratuitous 


research conclusions. An example of the last is a description of the 
- attributes supposedly held in common by the “high creatives” and а 


So-called highly moral group. The commonality of some of these attri- 
butes could have been tested by an analysis of data available in 
the study, but the analysis is not presented. 

Minor errors, ambiguities, and inconsistencies in text and tables 
are frequent enough to cause the reader to question the care with 


808 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


which the study was done. The groups of subjects fluctuate in size 
from comparison to comparison, and the reasons given do not al- 
ways seem entirely appropriate. However, it is not the little in- 
consistencies that make this a remarkable piece of research. It is 
the bald manner in which incompatible and inappropriate procedures 
have been mixed into a muddle whose meaning is largely incompre- 
hensible, whose relevance is frequently doubtful, and whose effect 
more often than not is to exasperate the expert and lead the layman 
astray. 

RICHARD pr MLE 

Pur R. MERRIFIELD 

University of Southern California 


(Editor). Glencoe, Ill.: The Free Press, 1960. Pp. 293. $7.50. 
This volume, the third by the Behavioral Models Project of the 
Bureau of Applied Social Research, Columbia University, presents 
chapters on information theory, R. D. Luce; learning theory, R. R. 
Bush; and manual tracking, J. C. R. Licklider. It is intended for 


1s equal to f(t) plus f(j „апа (c) the value f(i) = 1 if p (i) = 35. 
ormation H(x) = X f(x)p(z), 
°°", ***, n(symbols). The 
d S; rng definitions presented; and 
ificati . 09, = H (x) + H(y), and proved. Luce's 
Thratation of these points is the major contribution of this book. 
theory will ind their needs met M oe on information 
mem theory in psychological research is some- 
data al h EDU UAR theory may be used to analyze frequency 
р а А ар the lines of a multivariate Statistical analysis, or to as- 
gn Н values to stimuli depending upon (a) the sampling procedure 


where the random variable z — 1 
definition of H ig generalized: o 3 


BOOK REVIEWS 809 


used in generating the stimuli, (b) the way that the experimenter 
or the subject responds to the stimuli, and (c) in either case, the 
way that the experimenter selects the identifications between the ex- 
perimental events and the events in the model. Finally, it may be 
used to measure (assign H values to) some aspect of the subject’s 
behavior or to compare these measures with those of the stimuli. 
However, it is not a theory of behavior but simply a tool for quanti- 
fying and examining data. To predict behavior, the experimenter 
must explicitly or implicitly add behavioral postulates to those of 
information theory. The trouble with the research on information 
theory is that the behavioral postulates are seldom stated and the 
reader is left to guess their nature and source. The research reviewed 
by Luce is no exception. 

In his survey of mathematical learning theory, Bush includes an 
amusing summary of Hullian theory. “To a mathematician,” he 
writes, *Hull's theory is rather barren.” Furthermore, Hull's “pos- 
tulates are really empirical generalizations, and many psychologists 
have tried to test these so-called postulates by various kinds of ex- 
periments,” 

Bush would either derive relations between response probabilities 
on adjacent trials from axioms based on simple assumptions or 
postulate such a relation outright guided by the possibilities for 
useful mathematical elaboration. The technique may revolutionize 
the field; but the result is not so much a psychological theory as a 
method for analyzing certain kinds of data. Even this noteworthy 
contribution ean be oversold. The expert in this area is no more 
prepared to originate psychological research than an expert on 
analysis of variance or, for that matter, an expert on information 
theory. As with information theory, the experimenter must ex- 
plieitly or implicitly add behavioral postulates to those of the learn- 
ing model in order to choose the treatments and identifications, and 
must derive the predictions concerning the effects of treatments upon 
the parameters. The only independent variable treated strictly 
within the model is “relative frequency of reinforcement.” And even 
in this case, psychological postulates such as the “event-invariance 
principle” determine the choice of identifications. Bush borrows be- 
havioral postulates from Guthrie, Skinner, Mowrer, Pavlov, Miller, 
and, not least, Hull. Reminiscent of the pot and kettle, these pos- 
tulates appear to be empirically based, ad hoc, mathematically bar- 
ren, and, in the section under motivation, 17 years behind the times 
(the most recent source reference being Hull, 1943). The attempts 
to treat stimulus similarity as an additional source of variation face 
the unsolved problem of measurement. 

The models reviewed by Licklider are probably unknown to most 
Psychologists, for none of the references is listed in an APA journal. 
Engineering psychologists study manual tracking behavior in terms 


810 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of quasi-linear operator models “both as a practical matter in con- 
nection with the design of man-machine systems and as a part of 
behavior theory." The advantage of this approach is that “the con- 
cept of quasi-linearity permits...taking over highly developed 
linear theory (to) study behavior that is obviously non-linear....” 
The parameters of non-linear systems change in response to adjust- 
ments in input. The parameters of a quasi-linear system change now 
and again, but hold constant in between adjustments, or change so 
slowly that they may be treated as constant over reasonably long 
intervals. The reader whose knowledge of the techniques of signal 
analysis is rusty, or is not there to rust, will find the article difficult 
reading. Those who persist will discover a serious and scholarly 
approach to a subject matter that is apt to be of most interest to a 
specialist. 


Ковевт W. EARL 
Claremont Graduate School 


Applied Programed Instruction by Stuart Margulies and Lewis D. 
е (Editors). New York: John Wiley & Sons, Inc., 1962. Pp. 


The reader who scans 


Milf (ete toon the jacket claims and the preface to orient 


П likely to be disappointed. The twenty-five 

атт taken together, do present a reasonably good overview of 

Possible ways in which programed materials can be used in industry 

and in the armed Services training systems. 

RTI ча Үч кэ idi that the book, with 310 pages of 
e the it! i 

BANT сугы ы the background to answer precisely 


(а) How much time and money is į i i ini 
A $ y 18 involved in preparing trainin 
programs involving machines? oe 


(b) Should you use teachin, Wiech 
d e hat sort of Geiation MO ened books? 


$ ns is programed instruction appropri- 
3 اچ‎ how should the programers themselves be trained? 
WIE з Aid time will programed instruction require 

ag ашп programs employing conventional meth- 


1 €cisions concerning machine vs. books 
are difficult when most of the machi ile i 
MOD T ad d а achines available are unreliable 


Most educators will be amazed at the size of the training effort in 


BOOK REVIEWS 811 


industry and in the armed services. Over a billion dollars was spent 
last year for schooling of various kinds by the military people. 

The care that is being taken to measure results of training pro- 
grams is likely to have an effect in measurement in all educational 
endeavors. 

The article by Roe, Lyman and Moon, “The Dynamics of an 
Automated Teaching System” has exciting implications for the whole 
area of research design and educational evaluation. It points a way 
to find the most profitable areas to sample for the maximum in- 
formation concerning the interrelations of different learning situa- 
tion variables. It may be anticipated that as the implications of 
the thesis are worked out there will be a reversal of the trend to take 
a large number of samples at the sample points inherent in the 
present analysis of variance techniques. The emphasis will probably 
be placed on using the sample to help locate the right question by 
what might be considered successive approximations. 

Educators will find helpful the understanding they gain of the 
difference between general education and military and industrial 
training. Working through the sections on Internal Industrial Pro- 
grams, Military Programs, Computer Based Teaching Systems, 
Market Analysis and Economic Considerations, Machines and De- 
vices, and Programed Agencies will underline the differences. Some 
educators may be stimulated to evaluate ways of bridging the gap 
between the world of work and the world of general education. Like 
almost all books on programed learning, this one raises questions 
by implication concerning general school practice. 

J. А. К. WILSON 
University of California 
Santa Barbara 


Functions of Varied Experience by Donald W. Fiske and Salvatore 
R. Maddi. Homewood, Illinois: The Dorsey Press, Inc., 1961. 
Pp. vii + 501. $7.95 

. Two significant contributions to behavioral psychology are made 

In this volume by Fiske and Maddi: first, it is a comprehensive 

Summary of the theoretical and empirical literature concerning 

those aspects of behavior which are pertinent to variation in experi- 

ence; second, the authors provide a conceptual framework within 
which much of these empirical data may be interpreted. Varied ex- 

Perience, as the authors employ the term, refers not only to extero- 

ceptive stimulation, but also to interoceptive and cortical stimula- 

tion originating within the individual as well. 

Tn summarizing the relevant literature, Fiske and Maddi have 
Called upon nine contributing authors to discuss topics about which 
hey are well informed. Joe Kamiya, in a chapter entitled “Be- 
Avioral, Subjective, and Psychological Aspects of Drowsiness and 


812 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Sleep,” includes an analysis of the methods of measuring drowsiness 
and sleep, with an evaluation of the electroencephalograph as a 
measuring device in this area. He also considers the work of Kleit- 
man and others in the use of rapid eye movements (REM) as a 
measure of dreaming. 

A chapter by W. I. Welker analyzes the data related to play and 
exploratory behavior in animals; also included is a discussion of the 
indices currently in use as measures of exploratory behavior. Some 
of the difficulties in the measurement of such behavior are also 
considered. 

William N. Dember, in a chapter concerning alternation behavior 
in animals, takes a somewhat different approach. He reviews the 
four major theories of alternation behavior: Reaction Inhibition, 
Stimulus Satiation, Stimulus Change, and Action Decrement, Dem- 
ber then presents and interprets data obtained in tests of derivations 
from each of these theories, 

A fourth chapter, by Robert W. White, includes his systematic 
exposition of the need for competence and effectance as motivational 
concepts, in which he summarizes the major trends in the con- 
ceptualization of motivation in animal psychology, psychoanalytic 
ego psychology, and general psychology. 

J ames Bieri, in a further chapter, considers the theoretical concept 
of cognitive complexity, along with the empirical studies related to 
it, asa personality variable. The various methods of measuring cog- 
nitive complexity are assessed, included the Rep Test, the Barron- 
Welsh Art Scale, and social perception scores, 
pot ost interesting chapter by John R. Platt is entitled “Beauty: 

а and Change." Tt is primarily his exposition of the essential 
element in those things which we call "beautiful"—that is, “a pat- 
tern that contains the unexpected.” (p. 403) Platt draws on material 


fr m a num f ошаш; , 
neur 
om a ber 0 fields, incl d g oanatomy, psychology ar 


Additional chapters include a 
on the sensory and 


measuring response 


As a means of integrating the material considered in this volume, 


BOOK REVIEWS 813 


Fiske and Maddi, in an introductory chapter, present an interrelated 
set of eight propositions about the functions of varied experience in 
the organism. These propositions are presented as preliminary defi- 
nitions and hypotheses. Of particular interest is their Proposition VI 
concerning the organism’s characteristic level of activation, which 
provides numerous possibilities for empirical testing. Wherever ap- 
plicable, the authors have related their propositions and conceptual 
ideas to those of other researchers and theorists, and have included 
supporting research evidence. A re-evaluation of this conceptual 
framework concludes the volume; in this chapter, Fiske and Maddi 
designate those areas in which their propositions account for the em- 
pirical findings, as well as those areas in which their framework 
needs to be revised and extended. 

On the whole, their propositions are thought-provoking and in 
some cases are quite original. The authors discuss the problems in- 
volved in measuring their theoretical variables, and the problem of 
assessing individual differences which are related to these variables. 

Fiske and Maddi have also compiled an extensive 976-item bibliog- 
raphy, which will be of considerable value in and of itself to some 
researchers in this area. 

JOHN DELAMATER 
MARQUISA DELAMATER 
University of California 
Santa, Barbara 


Evaluation in Higher Education by Paul L. Dressel and Associates. 
New York: Houghton Mifflin Company, 1961. Pp. 480. 

A discussion of the critical importance of evaluation would be out 
of place in this brief review; yet one is reminded of the comment 
attributed to Mark Twain anent the persistent talking about the 
rain but the sparsity of action concerning it. As the volume's editor 
comments, there has been a lack of any “systematic consideration 
of the many types of judgments required in the conduct and direc- 
tion of colleges and universities.” 

It is to meet this need, among others, that this volume is offered. 
There is evidence of full awareness of the coming problems faced by 
collegiate institutions, problems stemming from changes in the na- 
ture of our society, our total economy, and our world relationships. 
The volume’s author and his associates are eminently qualified to 
Present guide lines for a sound evaluation program; there is much 
accumulated experience at the Office of Institutional Research, 
Michigan State University. 

‚ A notation concerning the titles of the thirteen chapters contained 
in this volume will offer some indication of its nature—the contents 
аге specific and practical; at the same time they are rich and philo- 
Sophical as needed when one is engaged in goal setting and value 


814 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


judgments: (1) The Essential Nature of Evaluation; (2) The Ob- 
jectives of Instruction; (3) Learning and Evaluation Processes; (4) 
Evaluation in Social Science; (5) Evaluation in the Natural 
Sciences; (6) Evaluation in the Humanities; (7) Evaluation of 
Communication Skills; (8) Testing and Grading Policies; (9) Com- 
prehensive Examination Programs; (10) Selection, Classification, 
and Placement of Students; (11) Evaluation of Instruction; (12) 
Surveys and Studies of Higher Education Needs and Problems; (13) 
Institutional Self-Evaluation. Two appendixes follow: (A) Tech- 
nical Considerations in Measurement and (B) Tentative Outline for 
the Michigan Survey. 

This volume does not suffer from the lack of unity so often found 
under conditions of multiple authorship. It is well organized and it 
cannot be labeled a how-to-do-it manual; it is a “discussion, with 
illustrations, of basie principles, concepts, procedures, and re- 
sources.” It is worthy of a spot of importance and of ready use on 
the desk of every person concerned with and involved in the prob- 
lems of collegiate institutions. 

Dorotuy M. COLLETT 
La Verne College, California 


The Vocational Interests of Nonprofessional Men by Kenneth E. 
Clark. Minneapolis: The University of Minnesota Press, 1961. 
Pp. xi + 129. $3.75 

In this small volume of 121 pages may be found an impressive 
amount of information on the measurement of vocational interest. 

However, the salient contribution is Clark’s report of the activity 

preferences of technical workers based on fifteen years of research. 

With the support of funds from the Office of Naval Research and 

hos cue from a number of graduate students, Dean Clark’s re- 

hy : as culminated in the Minnesota Vocational Inventory. Data 

m Eu 25,000 men were used in the extended research 

Clark has set an outstanding example of what in 
the ае of an interest, Santor: He Es Me 
гр 8 íi constructing and scoring the inventory, and he has con- 
ducted validity studies as well as other statistical analyses of the 
instrument. Validity data indieate that the inventory may predict 
fairly well achievement in technical training for students possessing 
at least average ability. Through the use of multiple discriminant 
analysis, comparisons were made of interest scores with job classifi- 
cation and performance in nine technical areas in the Navy. Al- 
ne Ac of the group classifications were correct in the valida- 
ү e, there was considerable shrinkage in the cross-vali- 

Major conclusions drawn by Clark are: 


BOOK REVIEWS 815 


1. Differences in the interests of skilled trades groups are sub- 
stantial enough to be used in classification and in counseling. 

2. An interest inventory used with such groups can be scored to 
reflect differences reliably. 

3. The scores derived can be used effectively for counseling or 
classification purposes. 


The Inventory consists of 190 item triads. The respondent is re- 
quired to select the activity he likes most and the one he most dis- 
likes to do. The keys for the various skilled trades were empirically 
derived by comparing the responses of a given skilled field with the 
scores of men in general. From 40 to 60 items seemed to be the 
optimal number for the 17 civilian and 13 Navy keys that have been 
developed. Gulliksen's method was used to obtain items “by select- 
ing each item for which the response of the criterion groups differed 
from the reference group by a given amount or more." From 12 to 
15 percentage points were used by Clark. Nine homogeneous scales 
were also developed by factor analysis, but these are insufficiently 
refined for use at this time. 

Early in the project, it was demonstrated that intuitive, a priori 
scoring keys are vastly inferior to empirically-derived keys. They 
also found that weighted scoring did not differentiate any better 
than unweighted scoring keys. 

In the concluding chapter, Clark offers a number of suggestions 
for generally improving interest measures. At the same time, he 
expresses concern for learning more about how interests develop and 
about what their relation is to other aspects of the individual. Users 
of the instrument are encouraged to help in doing further research 
despite the considerable work that has been done to date. 

Counselors and employment managers now have an interest in- 
ventory for use with nonprofessional men that compares favorably 
with the Strong Vocational Interest Blank for people in professional 
occupations. At the same time, the research reported sets a standard 
that needs to be emulated by the developers of other existent in- 


terest inventories. 
WILLIAM COLEMAN 


Santa Monica, California 


Thought and Language by Lev Semenovich Vygotsky, Translated 
by Eugenia Hanfmann and Gertrude Vakar. Published Jointly 
by The M.I.T. Press, Massachusetts Institute of Technology, 
and John Wiley & Sons, Inc., New York, 1962. Pp. 168. 

This volume has an interesting history. Its author was born in 
1896. During his student days at the University of Moscow he be- 
саше academically competent in the fields of linguistics, social 
science, psychology, philosophy, and the arts. In 1924 he began his 


816 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


systematic work in psychology and died ten years later of tubercu- 
losis, at the age of 38. 

According to Bruner, who presents the Introduction to the book, 
Vygotsky collaborated with Luria, Leontiev, and Sakharov and 
“launched a series of investigations in developmental psychology.” 
Published posthumously in 1934, this work was suppressed two 
years later and did not reappear again until 1956. 

In essence, Vygotsky's views offer a theory of intellectual de- 
velopment which is at the same time a "theory of education." There 
is, to a degree, the application of some tenets of dialectical mate- 
rialism. His theoretical discussions are based on a “concept-forma- 
tion study” conducted by the “double stimulation” type of method 
in such areas as egocentric speech, the interaction of school instruc- 
tion and mental development, and a comparison of the develop- 
ment of “scientific” concepts learned in school with the informally 
acquired “natural” concepts. It would appear that, in toto, the vol- 
ume attempts to demonstrate “that the relationship between thought 


and eh changes profoundly during the child's mental develop- 
men » 


volume under review nor the pam 


CHARLES J. MILLER 
La Verne College 
and 


Lawndale Public Schools, California 


Counseling and Guidance in Schools by C. H. Patterson. New York: 
» 1962. Pp. viii + 382. $4.75 


e major issues, he takes a 
stand that seems moderate ks the reviewer, perhaps because the 
/ € reviewer's views. Although some 
readers may not agree with Patterson in the positions he takes, he 


| 


BOOK REVIEWS 817 


stimulates their thinking about such problems as: (a) the extent to 
which counselors should do personal counseling; (b) the adequacy 
of the qualifications of the counselor; (c) the value of short-term 
NDEA. institutes; (d) the effectiveness of group counseling; (e) 
abuses in testing; and (f) the nature of professional relationships 
with other school personnel. 

A good cross-section of the writings in the pupil personnel area is 
cited by Patterson, and the reader is referred to many other recent 
publications for additional information. Thus, the student is able to 
compare Patterson’s viewpoint with the opinions of other leaders in 
pupil personnel work. At the same time, a well-integrated and con- 
sistent philosophy is expressed. 

Patterson takes umbrage with those who advocate that every 
teacher should be a counselor. He points out that many teachers do 
not have the personality, background, or interest to be effective 
counselors. To make them feel that they should be tends to cause 
them to develop guilt feelings, He emphasizes that educational- 
vocational counseling generally entails emotional involvement. The 
importance of the self-concept in vocational choice is described 
along with the theories of Ginzberg and Super. 

Instructors seeking a stimulating, well-written book for the in- 
troductory course on Guidance Services will do well to give careful 
consideration to this text. Experienced workers will find helpful the 
up-to-date discussion of most of the major issues in counseling and 
guidance. What higher praise can one offer a book than expressing 
the wish he had written the book himself? 

WILLIAM COLEMAN 
Santa Monica, California 


Educating Emotionally Disturbed Children by Norris G. Haring 
апа E. Lakin Phillips. New York: McGraw-Hill Book Company, 
Inc., 1962. Pp. xv + 322. $6.50 

This book presents a lively and readable discussion for all who 
are interested in the psychological and educational maturation of 
emotionally disturbed children. It is the result of an understanding 
of the disciplines of psychology and education—their complemen- 
tary and supplementary natures—their limitations as well as their 
potentialities in the field of emotionally disturbed children. Through- 
out this volume there is a realization that those who comprise the 
child’s vital adult relationships—parents and teachers—are the focal 
points of growth and awareness. 

As a result of several years of collaboration between the authors 
on a number of problems related to emotionally disturbed and other 
exceptional children in the public school milieu, they state in the 


818 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


preface: “It becomes unnecessary to separate the child from his 
home environment (in the overwhelming number of instances) in 
order to help him with his emotional problems, just as it is unneces- 
sary to neglect his educational progress in order to bring relief to his 
personality difficulties.” Whatever practical suggestions they offer 
can be learned very readily by interested adults. 

The authors emphasize that an emotionally disturbed child has a 
sizable “failure pattern” in his life experiences. A “success pattern” 
is not part of his contact with life. Furthermore, emotional dis- 
turbance is not a respecter of sex, religion, racial background, or 
intelligence. Also, there are many popular misconceptions regarding 
the emotionally disturbed child ranging from “he was born that 
way,” “institutionalization can better handle the child under all 
circumstances than the home can,” to “he will grow out of it.” Like 
many voiced opinions, whatever kernel of truth does exist in each 
of the above does not form a solid core of reality for wholesome 
treatment and education. 

In setting up their rationale, the authors deal with the central 
problem of structure. Ordered educational and social-emotional ex- 
periences within the school is the key here since this means that both 
the child's emotional difficulties and related problems of education 
receive a simultaneous consideration. 

Chapter 2, which is entitled “Selected Review of Educational 
Practices,” presents a brief overview of the major contributions 
fens with causation, behavior symptoms, and treatment methods. 

ettelheim, Pearson, Redl, Newman, Slavson, Rogers, Axline, 


M ae уны D Evelyn, Prescott, Cruickshank, and others 

An examination of recent investigations points u i rtance 
of the classroom experience in terms of ооо ех- 
pectations in the learning process as over and against a completely 
permissive environment. Both teacher personality and teacher com- 
Bins a vital factors in classroom management. Other school 
рар p B visiting teacher, psychologist, principal, co- 
icri] lop education, school physician, teaching assistant, 
oe vie ping teacher can all contribute their observations 

out ach child as part of gaining a total awaren f f the 
child's problems. Eom 


Much ean b i ; . 
Mirac E Merry about this valuable work. But suffice it to 


y central philosophy an h i 

child has a potential which ski ali ed oen Boe 

standing e oan skillful handling and patient under- 
e theory presented here is clear] i 

У e y and simply w nts 

za ies it most helpful to understand what ста ути 

о. This volume will probably find а ready and wide audience. Tt 


BOOK REVIEWS 819 


should lead to many constructive suggestions and experiences in the 
field of education. 

ARTHUR LERNER 

Los Angeles City College 


Teaching Secondary School Social Studies, by James High. New 
York: John Wiley & Sons, Inc., 1962. Pp. 481. 

The author is to be commended for the special manner in which 
he handles the various academic areas included in the vast, massive 
field called Social Studies. He states the dual purpose of meeting 
not only the needs of the student in training but also of providing 
a syllabus that can readily be adapted to in-service training pro- 
grams. The purpose of this review is not to judge the academic areas 
but to look more closely at the evaluative processes advocated by 
the author. 

It is refreshing to discover someone who will advance the idea in 
a straightforward manner that education should be looking at long 
range evaluation and at subjective criteria concerned with behavior. 
The author gives the names of various tests that could be used in 
this area as well as in the area of achievement. It is also of im- 
portance that consideration of effectiveness of teachers is included. 

"There is an interesting chapter on student teaching or internship. 
This leaves much to be desired, not in terms of the thought-provok- 
ing inclusions, but more in the limited way in which it is handled. 
The area of creativity is sorely lacking in this book, but all in all the 
general organization and widespread treatment is well done in spite 
of the lack of depth in some areas which this reviewer might have 
desired. 

The book is academically written and presented somewhat at the ex- 
pense of general aesthetics as far as the cover, and illustrations and 
pictures are considered. This book’s primary usefulness is in terms of 
a text or a syllabus and meets the criteria the author set for himself. 

Roy M. Етсн 
San Fernando Valley State College 


The Amidon Elementary School, by Carl F. Hansen. Englewood 
Cliffs, New Jersey: Prentice-Hall, Inc., 1962. Pp. 252. 

This book might better be termed a report on an educational 
philosophy practiced in a single school in the District of Columbia. 
The report is pursued with great vigor along the lines of returning 
education to the classroom. Throughout the book an attempt is 


820 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


skillfully made to blend the desire for academic achievement with 
many of the newer concepts of group dynamies. The book is seri- 
ously lacking in good research technique and tangible results. 

First, the evaluation by pupil and parents is simply a series of 
subjective testimonials. The sixteen teachers engaged at this school 
also submit subjective treatises on the worthiness of the experiment. 
The most objective data are achievement test results which would 
indicate that the Amidon School is somewhat ahead of the norms of 
achievement tests used. In terms of I.Q., the Amidon group shows 
a range which might be expected from any chance selection; how- 
ever, it is not so clear what the median I.Q. of the various grade 
levels were. 

The author time and again expresses the idea that the youngsters 
who were voluntarily transported by parents from outside of the 
normal service area are not of outstanding mental ability. How- 
ever, the very fact that parents are anxious enough to furnish such 
transportation may, indicate a variable which has been sorely over- 
looked. The whole’ basis for the argument in favor of the Amidon 
Elementary’ School is‘based on the position of the school as a whole 
in terms of the achievement test results compared with national 
norms. However, it myst be pointed out that there is no starting 
point from which toan&ke a comparison. In other words, how good 
were the youngsters ‘before they participated in the experiment? 

Statistical comparisons, which are in terms of grade place equiva- 
lents or quartile rankings, leave much to be desired. The value of 
the book can be expressed in the terms of the writer, "Though not 
В үа proof of superiority of the Amidon system, the tests 
M Le Em de inito ruri 
statement contains irc eir ick. Pru ieri is 

S e of thia tone; е contradiction of the entire book. 
Му 9н is book is to be found in the zeal and determina- 
bruce oe to better American education. It is a sad 
ае i worthy objective is not backed by a compre- 

Roy M. Fire 
San Fernando Valley State College 


The Psychology of Insanity by B i i 
у y Bernard Hart. : x 
erty Press, 1902. Pp. xi + 127. $1.25 Br 
8 classic on insanity first made its a i 
"d 6c pearance in 1912. Though 
M fifth edition, the present volume is the first paperback бе 
е constant reprintings in England and America indicate the ap- 
peal the book has maintained through the years. 


It is advisable to keep in mind that the main aim of the book was 


-——— Oc NE ——————— Hl 


BOOK REVIEWS 821 


to indieate how the work of Janet, Freud, Jung, and others served 
to illuminate the normal mind and the mentally disturbed one. 
Freud's principles at the time when this book first appeared were 
uniformly held by the analytic school of thought—though differ- 
ences were to break out a short while later. 

There is an interesting historical bit of information here as well 
as а lucid account of the modern concept that “sanity” and “in- 
sanity” are not divided into clear-cut areas; rather they shade and 
gradually blend into one another. 

One leaves the reading of this book with a feeling that all inter- 
ested people can profit from its contents. 

ARTHUR LERNER 
Los Angeles City College 


és + 


